View this notebook on GitHub or run it yourself on Binder!
Register a custom schema¶
rubicon_schema
can be constructed within a Python session in addition to being read from the registry’s YAML files
Define additional metadata to log¶
Add an additional variable to the environment to record with our rubicon_schema
[1]:
import os
os.environ["RUNTIME_ENV"] = "AWS"
! echo $RUNTIME_ENV
AWS
Construct a custom schema¶
Create a dictionary representation of the new, custom schema. This new schema will extend the existing RandomForestClassifier
schema with an additional parameter that logs the new environment variable
Note: The extends
key is not required - custom schema do not need to extend existing schema
[2]:
import pprint
extended_schema = {
"name": "sklearn__RandomForestClassifier__ext",
"extends": "sklearn__RandomForestClassifier",
"parameters": [
{"name": "runtime_environment", "value_env": "RUNTIME_ENV"},
],
}
pprint.pprint(extended_schema)
{'extends': 'sklearn__RandomForestClassifier',
'name': 'sklearn__RandomForestClassifier__ext',
'parameters': [{'name': 'runtime_environment', 'value_env': 'RUNTIME_ENV'}]}
Apply a custom schema to a project¶
Create a rubicon_ml
project
[3]:
from rubicon_ml import Rubicon
rubicon = Rubicon(persistence="memory", auto_git_enabled=True)
project = rubicon.create_project(name="apply schema")
project
[3]:
<rubicon_ml.client.project.Project at 0x11251af90>
Apply the custom schema to the project
[4]:
project.set_schema(extended_schema)
Log model metadata with a custom schema¶
Load a training dataset
[5]:
from sklearn.datasets import load_wine
X, y = load_wine(return_X_y=True, as_frame=True)
Train an instance of the model the schema represents
[6]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(
ccp_alpha=5e-3,
criterion="log_loss",
max_features="log2",
n_estimators=24,
oob_score=True,
random_state=121,
)
rfc.fit(X, y)
print(rfc)
RandomForestClassifier(ccp_alpha=0.005, criterion='log_loss',
max_features='log2', n_estimators=24, oob_score=True,
random_state=121)
Log the model metadata defined in the base RandomForestClassifier
plus the additional parameter from the environment to a new experiment in project
with project.log_with_schema
[7]:
experiment = project.log_with_schema(
rfc,
experiment_kwargs={
"name": "log with extended schema",
"model_name": "RandomForestClassifier",
"description": "logged with an extended `rubicon_schema`",
},
)
experiment
[7]:
<rubicon_ml.client.experiment.Experiment at 0x169f92b10>
View the experiment’s logged metadata¶
Each experiment contains all the data represented in the base RandomForestClassifier
schema plus the additional parameter from the environment
[8]:
for parameter in experiment.parameters():
print(f"{parameter.name}: {parameter.value}")
bootstrap: True
ccp_alpha: 0.005
class_weight: None
criterion: log_loss
max_depth: None
max_features: log2
min_impurity_decrease: 0.0
max_leaf_nodes: None
max_samples: None
min_samples_split: 2
min_samples_leaf: 1
min_weight_fraction_leaf: 0.0
n_estimators: 24
oob_score: True
random_state: 121
runtime_environment: AWS
Don’t forget to clean up
[9]:
del os.environ["RUNTIME_ENV"]
Persisting and sharing a custom schema¶
To share custom schema with all rubicon_schema
users, check out the “Contribute a rubicon_schema
” section