.. _glossary: Glossary ******** Project (:ref:`rubicon_ml.Project`) ============================================================== A **project** is a collection of **experiments**, **dataframes**, and **artifacts** identified by a unique name. .. code-block:: python from rubicon_ml import Rubicon rubicon = Rubicon(persistence="memory") project = rubicon.create_project(name="Glossary") Experiment (:ref:`rubicon_ml.Experiment`) ======================================================================= An **experiment** represents a model run and is identified by its ``created_at`` time. It can have **metrics**, **parameters**, **features**, **dataframes**, and **artifacts** logged to it. An **experiment** is logged to a **project**. .. code-block:: python experiment = project.log_experiment(tags=["glossary"]) Parameter (:ref:`rubicon_ml.Parameter`) ==================================================================== A **parameter** is an input to an **experiment** (model run) that depends on the type of model being used. It affects the model's predictions. For example, if you were using a random forest classifier, ``n_estimators`` (the number of trees in the forest) could be a **parameter**. A **parameter** is logged to an **experiment**. .. code-block:: python experiment.log_parameter("n_estimators", 20) Feature (:ref:`rubicon_ml.Feature`) ============================================================== A **feature** is an input to an **experiment** (model run) that's an independent, measurable property of a phenomenon being observed. It affects the model's predictions. For example, consider a model that predicts how likely a customer is to pay back a loan. Possible **features** could be ``year`` or ``credit score``. A **feature** is logged to an **experiment**. .. code-block:: python experiment.log_feature("year", importance=0.125) experiment.log_feature("credit score", importance=0.250) Metric (:ref:`rubicon_ml.Metric`) =========================================================== A **metric** is a single-value output of an **experiment** that helps evaluate the quality of the model's predictions. It can be either a ``score`` (value to maximize) or a ``loss`` (value to minimize). A **metric** is logged to an **experiment**. .. code-block:: python experiment.log_metric("accuracy", 0.933, directionality="score") Dataframe (:ref:`rubicon_ml.Dataframe`) ==================================================================== A **dataframe** is a two-dimensional, tabular dataset with labeled axes (rows and columns) that provides value to the model developer and/or reviewer when visualized. For example, confusion matrices, feature importance tables and marginal residuals can all be logged as a **dataframe**. A **dataframe** is logged to a **project** or an **experiment**. .. code-block:: python import pandas as pd confusion_matrix = pd.DataFrame( [[5, 0, 0], [0, 5, 1], [0, 0, 4]], columns=["x", "y", "z"], ) experiment.log_dataframe(confusion_matrix) Artifact (:ref:`rubicon_ml.Artifact`) ================================================================= An **artifact** is a catch-all for any other type of data that can be logged to a file. For example, a snapshot of a trained model (.pkl) can be logged to the **experiment** created during its run. Or, a base model for the model in development can be logged to a **project** when leveraging transfer learning. An **artifact** is logged to a **project** or an **experiment**. .. code-block:: python experiment.log_artifact(data_path="path/to/data.pkl")