Glossary¶
Project (rubicon_ml.Project)¶
A project is a collection of experiments, dataframes, and artifacts identified by a unique name.
from rubicon_ml import Rubicon
rubicon = Rubicon(persistence="memory")
project = rubicon.create_project(name="Glossary")
Experiment (rubicon_ml.Experiment)¶
An experiment represents a model run and is identified by its created_at
time.
It can have metrics, parameters, features, dataframes, and artifacts
logged to it.
An experiment is logged to a project.
experiment = project.log_experiment(tags=["glossary"])
Parameter (rubicon_ml.Parameter)¶
A parameter is an input to an experiment (model run) that depends on the type of model being used. It affects the model’s predictions.
For example, if you were using a random forest classifier, n_estimators
(the number
of trees in the forest) could be a parameter.
A parameter is logged to an experiment.
experiment.log_parameter("n_estimators", 20)
Feature (rubicon_ml.Feature)¶
A feature is an input to an experiment (model run) that’s an independent, measurable property of a phenomenon being observed. It affects the model’s predictions.
For example, consider a model that predicts how likely a customer is to pay back a loan.
Possible features could be year
or credit score
.
A feature is logged to an experiment.
experiment.log_feature("year", importance=0.125)
experiment.log_feature("credit score", importance=0.250)
Metric (rubicon_ml.Metric)¶
A metric is a single-value output of an experiment that helps evaluate the quality of the model’s predictions.
It can be either a score
(value to maximize) or a loss
(value to minimize).
A metric is logged to an experiment.
experiment.log_metric("accuracy", 0.933, directionality="score")
Dataframe (rubicon_ml.Dataframe)¶
A dataframe is a two-dimensional, tabular dataset with labeled axes (rows and columns) that provides value to the model developer and/or reviewer when visualized.
For example, confusion matrices, feature importance tables and marginal residuals can all be logged as a dataframe.
A dataframe is logged to a project or an experiment.
import pandas as pd
confusion_matrix = pd.DataFrame(
[[5, 0, 0], [0, 5, 1], [0, 0, 4]],
columns=["x", "y", "z"],
)
experiment.log_dataframe(confusion_matrix)
Artifact (rubicon_ml.Artifact)¶
An artifact is a catch-all for any other type of data that can be logged to a file.
For example, a snapshot of a trained model (.pkl) can be logged to the experiment created during its run. Or, a base model for the model in development can be logged to a project when leveraging transfer learning.
An artifact is logged to a project or an experiment.
experiment.log_artifact(data_path="path/to/data.pkl")