View this notebook on GitHub or run it yourself on Binder!


Set a schema on a project#

“Log a rubicon_ml experiment with a rubicon_schema” showed how rubicon_schema can infer schema from the object to log - sometimes, this may not be possible and a schema may need to be set manually

Select a schema#

View all available schema

[1]:
from rubicon_ml.schema import registry

available_schema = registry.available_schema()
available_schema
[1]:
['sklearn__RandomForestClassifier',
 'xgboost__XGBClassifier',
 'xgboost__DaskXGBClassifier']

Load a schema

[2]:
import pprint

rfc_schema = registry.get_schema("sklearn__RandomForestClassifier")
pprint.pprint(rfc_schema)
{'artifacts': ['self'],
 'compatibility': {'scikit-learn': {'max_version': None,
                                    'min_version': '1.0.2'}},
 'docs_url': 'https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html',
 'features': [{'importances_attr': 'feature_importances_',
               'names_attr': 'feature_names_in_',
               'optional': True}],
 'metrics': [{'name': 'classes', 'value_attr': 'classes_'},
             {'name': 'n_classes', 'value_attr': 'n_classes_'},
             {'name': 'n_features_in', 'value_attr': 'n_features_in_'},
             {'name': 'n_outputs', 'value_attr': 'n_outputs_'},
             {'name': 'oob_decision_function',
              'optional': True,
              'value_attr': 'oob_decision_function_'},
             {'name': 'oob_score',
              'optional': True,
              'value_attr': 'oob_score_'}],
 'name': 'sklearn__RandomForestClassifier',
 'parameters': [{'name': 'bootstrap', 'value_attr': 'bootstrap'},
                {'name': 'ccp_alpha', 'value_attr': 'ccp_alpha'},
                {'name': 'class_weight', 'value_attr': 'class_weight'},
                {'name': 'criterion', 'value_attr': 'criterion'},
                {'name': 'max_depth', 'value_attr': 'max_depth'},
                {'name': 'max_features', 'value_attr': 'max_features'},
                {'name': 'min_impurity_decrease',
                 'value_attr': 'min_impurity_decrease'},
                {'name': 'max_leaf_nodes', 'value_attr': 'max_leaf_nodes'},
                {'name': 'max_samples', 'value_attr': 'max_samples'},
                {'name': 'min_samples_split',
                 'value_attr': 'min_samples_split'},
                {'name': 'min_samples_leaf', 'value_attr': 'min_samples_leaf'},
                {'name': 'min_weight_fraction_leaf',
                 'value_attr': 'min_weight_fraction_leaf'},
                {'name': 'n_estimators', 'value_attr': 'n_estimators'},
                {'name': 'oob_score', 'value_attr': 'oob_score'},
                {'name': 'random_state', 'value_attr': 'random_state'}],
 'verison': '1.0.0'}

Apply the schema to a project#

Create a rubicon_ml project

[3]:
from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="memory")
project = rubicon.create_project(name="apply schema")
project
[3]:
<rubicon_ml.client.project.Project at 0x134d4fd50>

Set the schema on the project

[4]:
project.set_schema(rfc_schema)

Now, log_with_schema will leverage the schema rfc_schema instead of trying to infer one