View this notebook on GitHub or run it yourself on Binder!


Register a custom schema#

rubicon_schema can be constructed within a Python session in addition to being read from the registry’s YAML files

Define additional metadata to log#

Add an additional variable to the environment to record with our rubicon_schema

[1]:
import os

os.environ["RUNTIME_ENV"] = "AWS"

! echo $RUNTIME_ENV
AWS

Construct a custom schema#

Create a dictionary representation of the new, custom schema. This new schema will extend the existing RandomForestClassifier schema with an additional parameter that logs the new environment variable

Note: The extends key is not required - custom schema do not need to extend existing schema

[2]:
import pprint

extended_schema = {
    "name": "sklearn__RandomForestClassifier__ext",
    "extends": "sklearn__RandomForestClassifier",

    "parameters": [
        {"name": "runtime_environment", "value_env": "RUNTIME_ENV"},
    ],
}
pprint.pprint(extended_schema)
{'extends': 'sklearn__RandomForestClassifier',
 'name': 'sklearn__RandomForestClassifier__ext',
 'parameters': [{'name': 'runtime_environment', 'value_env': 'RUNTIME_ENV'}]}

Apply a custom schema to a project#

Create a rubicon_ml project

[3]:
from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="memory", auto_git_enabled=True)
project = rubicon.create_project(name="apply schema")
project
[3]:
<rubicon_ml.client.project.Project at 0x11251af90>

Apply the custom schema to the project

[4]:
project.set_schema(extended_schema)

Log model metadata with a custom schema#

Load a training dataset

[5]:
from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True, as_frame=True)

Train an instance of the model the schema represents

[6]:
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(
    ccp_alpha=5e-3,
    criterion="log_loss",
    max_features="log2",
    n_estimators=24,
    oob_score=True,
    random_state=121,
)
rfc.fit(X, y)

print(rfc)
RandomForestClassifier(ccp_alpha=0.005, criterion='log_loss',
                       max_features='log2', n_estimators=24, oob_score=True,
                       random_state=121)

Log the model metadata defined in the base RandomForestClassifier plus the additional parameter from the environment to a new experiment in project with project.log_with_schema

[7]:
experiment = project.log_with_schema(
    rfc,
    experiment_kwargs={
        "name": "log with extended schema",
        "model_name": "RandomForestClassifier",
        "description": "logged with an extended `rubicon_schema`",
    },
)
experiment
[7]:
<rubicon_ml.client.experiment.Experiment at 0x169f92b10>

View the experiment’s logged metadata#

Each experiment contains all the data represented in the base RandomForestClassifier schema plus the additional parameter from the environment

[8]:
for parameter in experiment.parameters():
    print(f"{parameter.name}: {parameter.value}")
bootstrap: True
ccp_alpha: 0.005
class_weight: None
criterion: log_loss
max_depth: None
max_features: log2
min_impurity_decrease: 0.0
max_leaf_nodes: None
max_samples: None
min_samples_split: 2
min_samples_leaf: 1
min_weight_fraction_leaf: 0.0
n_estimators: 24
oob_score: True
random_state: 121
runtime_environment: AWS

Don’t forget to clean up

[9]:
del os.environ["RUNTIME_ENV"]

Persisting and sharing a custom schema#

To share custom schema with all rubicon_schema users, check out the “Contribute a rubicon_schema” section