Glossary#

Project (rubicon_ml.Project)#

A project is a collection of experiments, dataframes, and artifacts identified by a unique name.

from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="memory")
project = rubicon.create_project(name="Glossary")

Experiment (rubicon_ml.Experiment)#

An experiment represents a model run and is identified by its created_at time. It can have metrics, parameters, features, dataframes, and artifacts logged to it.

An experiment is logged to a project.

experiment = project.log_experiment(tags=["glossary"])

Parameter (rubicon_ml.Parameter)#

A parameter is an input to an experiment (model run) that depends on the type of model being used. It affects the model’s predictions.

For example, if you were using a random forest classifier, n_estimators (the number of trees in the forest) could be a parameter.

A parameter is logged to an experiment.

experiment.log_parameter("n_estimators", 20)

Feature (rubicon_ml.Feature)#

A feature is an input to an experiment (model run) that’s an independent, measurable property of a phenomenon being observed. It affects the model’s predictions.

For example, consider a model that predicts how likely a customer is to pay back a loan. Possible features could be year or credit score.

A feature is logged to an experiment.

experiment.log_feature("year", importance=0.125)
experiment.log_feature("credit score", importance=0.250)

Metric (rubicon_ml.Metric)#

A metric is a single-value output of an experiment that helps evaluate the quality of the model’s predictions.

It can be either a score (value to maximize) or a loss (value to minimize).

A metric is logged to an experiment.

experiment.log_metric("accuracy", 0.933, directionality="score")

Dataframe (rubicon_ml.Dataframe)#

A dataframe is a two-dimensional, tabular dataset with labeled axes (rows and columns) that provides value to the model developer and/or reviewer when visualized.

For example, confusion matrices, feature importance tables and marginal residuals can all be logged as a dataframe.

A dataframe is logged to a project or an experiment.

import pandas as pd

confusion_matrix = pd.DataFrame(
    [[5, 0, 0], [0, 5, 1], [0, 0, 4]],
    columns=["x", "y", "z"],
)
experiment.log_dataframe(confusion_matrix)

Artifact (rubicon_ml.Artifact)#

An artifact is a catch-all for any other type of data that can be logged to a file.

For example, a snapshot of a trained model (.pkl) can be logged to the experiment created during its run. Or, a base model for the model in development can be logged to a project when leveraging transfer learning.

An artifact is logged to a project or an experiment.

experiment.log_artifact(data_path="path/to/data.pkl")