View this notebook on GitHub or run it yourself on Binder!

Logging Experiments¶

rubicon_ml’s core functionality is centered around logging experiments to explain and explore various model runs throughout the model development lifecycle. This example will take a quick look at how we can log model metadata to rubicon_ml in the context of a simple classification project.

We’ll leverage the palmerpenguins dataset collected by Dr. Kristen Gorman as our training/testing data. More information on the dataset can be found here.

Our goal is to create a simple classification model to differentiate the species of penguins present in the dataset. We’ll leverage rubicon_ml logging to make it easy to compare runs of our model as well as preserve important information for reproducibility later.

[1]:

! pip install palmerpenguins

Requirement already satisfied: palmerpenguins in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (0.1.4)
Requirement already satisfied: numpy in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from palmerpenguins) (1.21.6)
Requirement already satisfied: pandas in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from palmerpenguins) (1.4.2)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from pandas->palmerpenguins) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from pandas->palmerpenguins) (2022.1)
Requirement already satisfied: six>=1.5 in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas->palmerpenguins) (1.16.0)

First, we’ll load the dataset and perform some basic data preparation. In many scenarios, this will likely be done before loading training/testing data and before experimentation begins.

[2]:

from palmerpenguins import load_penguins

penguins_df = load_penguins()
target_values = penguins_df['species'].unique()

print(f"target classes (species): {target_values}")
penguins_df.head()

target classes (species): ['Adelie' 'Gentoo' 'Chinstrap']

[2]:

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	male	2007
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	female	2007
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	female	2007
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN	2007
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	female	2007

Let’s encode the string variables in our dataset to categoricals so our KNN can work with the data.

[3]:

from sklearn.preprocessing import LabelEncoder

for column in ["species", "island", "sex"]:
    penguins_df[column] = LabelEncoder().fit_transform(penguins_df[column])

print(f"target classes (species): {penguins_df['species'].unique()}")
penguins_df.head()

target classes (species): [0 2 1]

[3]:

	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	2	39.1	18.7	181.0	3750.0	1	2007
1	2	39.5	17.4	186.0	3800.0	0	2007
2	2	40.3	18.0	195.0	3250.0	0	2007
3	2	NaN	NaN	NaN	NaN	2	2007
4	2	36.7	19.3	193.0	3450.0	0	2007

Finally, we’ll split the preprocessed data into a train and test set.

[4]:

from sklearn.model_selection import train_test_split

train_penguins_df, test_penguins_df = train_test_split(penguins_df, test_size=.30)

target_name = "species"
feature_names = [c for c in train_penguins_df.columns if c != target_name]

X_train, y_train = train_penguins_df[feature_names], train_penguins_df[target_name]
X_test, y_test = test_penguins_df[feature_names], test_penguins_df[target_name]

X_train.shape, y_train.shape, X_test.shape, y_test.shape

[4]:

((240, 7), (240,), (104, 7), (104,))

Now we can create and train a simple Scikit-learn pipeline to organize our model training code. We’ll use a SimpleImputer to fill in missing values followed by a KNeighborsClassifier to classify the penguins.

[5]:

from sklearn.impute import SimpleImputer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline

imputer_strategy = "mean"
classifier_n_neighbors = 5

steps = [
    ("si", SimpleImputer(strategy=imputer_strategy)),
    ("kn", KNeighborsClassifier(n_neighbors=classifier_n_neighbors)),
]

penguin_pipeline = Pipeline(steps=steps)
penguin_pipeline.fit(X_train, y_train)

score = penguin_pipeline.score(X_test, y_test)
score

[5]:

0.7307692307692307

We’ve completed a training run, so let’s finally log our results to rubicon_ml ! We’ll create an entrypoint to the local filesystem and create a project called “classifying penguins” to store our results. rubicon_ml’s log_* methods can be placed throughout your model code to log any important information along the way. Entities available for logging via the log_* methods can be found in our glossary.

[6]:

from rubicon_ml import Rubicon

rubicon = Rubicon(
    persistence="filesystem",
    root_dir="./rubicon-root",
    auto_git_enabled=True,
)
project = rubicon.get_or_create_project(name="classifying penguins")
experiment = project.log_experiment()

for feature_name in feature_names:
    experiment.log_feature(name=feature_name)

_ = experiment.log_parameter(name="strategy", value=imputer_strategy)
_ = experiment.log_parameter(name="n_neighbors", value=classifier_n_neighbors)
_ = experiment.log_metric(name="accuracy", value=score)

After logging, we can inspect the various attributes of our logged entities. All available attributes can be found in our API reference.

[7]:

print(experiment)
print()
print(f"git info:")
print(f"\tbranch name: {experiment.branch_name}\n\tcommit hash: {experiment.commit_hash}")
print(f"features: {[f.name for f in experiment.features()]}")
print(f"parameters: {[(p.name, p.value) for p in experiment.parameters()]}")
print(f"metrics: {[(m.name, m.value) for m in experiment.metrics()]}")

Experiment(project_name='classifying penguins', id='c484caf8-bdc1-429f-b012-7a4e02dbc83a', name=None, description=None, model_name=None, branch_name='210-new-quick-look', commit_hash='490e8af895f2cd0636c72295c2762b21cd6c8102', training_metadata=None, tags=[], created_at=datetime.datetime(2022, 6, 30, 13, 51, 4, 958916))

git info:
        branch name: 210-new-quick-look
        commit hash: 490e8af895f2cd0636c72295c2762b21cd6c8102
features: ['island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex', 'year']
parameters: [('strategy', 'mean'), ('n_neighbors', 5)]
metrics: [('accuracy', 0.7307692307692307)]

Tracking the results of a single model fit is nice, but rubicon_ml can really shine when we’re iterating over numerous model fits - like a hyperparameter search. The code below performs a very basic hyperparameter search for a strategy for the SimpleImputer and an n_neighbors for the KNeighborsClassifier while logging the results of each model fit to a new rubicon_ml experiment.

[8]:

from sklearn.base import clone

for imputer_strategy in ["mean", "median", "most_frequent"]:
    for classifier_n_neighbors in [5, 10, 15, 20]:
        pipeline = clone(penguin_pipeline)
        pipeline.set_params(
            si__strategy=imputer_strategy,
            kn__n_neighbors=classifier_n_neighbors,
        )

        pipeline.fit(X_train, y_train)
        score = pipeline.score(X_test, y_test)

        experiment = project.log_experiment(tags=["parameter search"])

        for feature_name in feature_names:
            experiment.log_feature(name=feature_name)
        experiment.log_parameter(name="strategy", value=imputer_strategy)
        experiment.log_parameter(name="n_neighbors", value=classifier_n_neighbors)
        experiment.log_metric(name="accuracy", value=score)

Now we can take a look at a few experiments and compare our results. Notice that we’re still pulling experiments from the same project that we logged the first one to. However, we’re only retrieving the experiments from the search above by using the “parameter search” tag when we get our experiments. Each experiment in the hyperparameter search above was tagged with “parameter search” when it was logged.

[9]:

print("experiments:")
for experiment in project.experiments(tags=["parameter search"]):
    print(
        f"\tid: {experiment.id}, "
        f"parameters: {[(p.name, p.value) for p in experiment.parameters()]}, "
        f"metrics: {[(m.name, m.value) for m in experiment.metrics()]}"
    )

experiments:
        id: a75b1258-2276-4eb1-beb5-caf83e9aacf3, parameters: [('strategy', 'mean'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7307692307692307)]
        id: 02a89318-b8d9-49a5-9337-7e4368cc54da, parameters: [('strategy', 'mean'), ('n_neighbors', 10)], metrics: [('accuracy', 0.75)]
        id: ce24eeef-4686-4fc7-8c0a-e73d6c9cdb71, parameters: [('strategy', 'mean'), ('n_neighbors', 15)], metrics: [('accuracy', 0.7596153846153846)]
        id: 093a9d02-89f7-4e48-82b1-f9ade435ef03, parameters: [('strategy', 'mean'), ('n_neighbors', 20)], metrics: [('accuracy', 0.7211538461538461)]
        id: bc4d0503-32d1-4a11-8222-4151dae893cf, parameters: [('strategy', 'median'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7211538461538461)]
        id: c1b6cb3a-0ad1-4932-914d-ba53a054891b, parameters: [('strategy', 'median'), ('n_neighbors', 10)], metrics: [('accuracy', 0.7403846153846154)]
        id: 9d6ffe67-088d-483f-9d3f-8f0fb34c22e8, parameters: [('strategy', 'median'), ('n_neighbors', 15)], metrics: [('accuracy', 0.7596153846153846)]
        id: f497245a-6149-4604-9ceb-da74ae9855d4, parameters: [('strategy', 'median'), ('n_neighbors', 20)], metrics: [('accuracy', 0.7211538461538461)]
        id: b2cd8067-ad4c-4ed5-87f7-2cd4536b2c73, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7211538461538461)]
        id: c4277327-381a-4885-aba4-a07c050463a5, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 10)], metrics: [('accuracy', 0.75)]
        id: d4ea2fe7-061e-4f5e-8958-e6ac29025708, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 15)], metrics: [('accuracy', 0.7596153846153846)]
        id: d9fe2005-824c-4e23-9809-e0459e57d78a, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 20)], metrics: [('accuracy', 0.7211538461538461)]

rubicon_ml can log more complex data as well. Below we’ll log our trained model as an artifact (generic binary) and a confusion matrix explaining the results as a dataframe (accepts both pandas and dask dataframes natively).

[10]:

import pandas as pd
from sklearn.metrics import confusion_matrix

experiment = project.experiments(tags=["parameter search"])[-1]

trained_model = pipeline._final_estimator
experiment.log_artifact(data_object=trained_model, name="trained model")

y_pred = pipeline.predict(X_test)
confusion_matrix_df = pd.DataFrame(
    confusion_matrix(y_test, y_pred),
    columns=target_values,
    index=target_values,
)
experiment.log_dataframe(confusion_matrix_df, name="confusion matrix")

print(experiment.artifact(name="trained model").get_data(unpickle=True))
experiment.dataframe(name="confusion matrix").get_data()

KNeighborsClassifier(n_neighbors=20)

[10]:

	Adelie	Chinstrap
Adelie	37	3
Gentoo	19	1
Chinstrap	6	38