View this notebook on GitHub or run it yourself on Binder!


Query projects & experiments with RubiconJSON#

Users can utilize the RubiconJSON class to query rubicon-ml logs in a JSONPath-like manner.

RubiconJSON takes in top-level Rubicon objects, Projects, and/or Experiments and composes a JSON representation of them. Then, with the search method, users can query their logged data using JSONPath syntax.

RubiconJSON relies on the ``jsonpath_ng` library <https://github.com/h2non/jsonpath-ng>`__ for query parsing. More information on the allowed syntax can be found here in their documentation.

[1]:
from rubicon_ml import Rubicon

from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, make_scorer, precision_score, recall_score
from sklearn.model_selection import ParameterGrid, train_test_split

Trian some models, log some experiments#

We’ll start off by loading a dataset and creating our rubicon-ml project.

[2]:
X, y = load_wine(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0)
[3]:
rubicon = Rubicon(persistence="memory", auto_git_enabled=True)
project = rubicon.get_or_create_project(name="jsonpath querying")

Now, let’s train and evaluate some models and log thier metadata to rubicon-ml.

[4]:
for parameters in ParameterGrid({
    "n_estimators": [5, 50, 500],
    "min_samples_leaf": [1, 10, 100],
}):
    rfc = RandomForestClassifier(random_state=0, **parameters)

    tags = ["large"] if parameters["n_estimators"] > 10 else []
    experiment = project.log_experiment(model_name=rfc.__class__.__name__, tags=tags)
    for name, value in parameters.items():
        experiment.log_parameter(name=name, value=value)
    for name in X_train.columns:
        experiment.log_feature(name=name)

    rfc.fit(X_train, y_train)

    precision_scorer = make_scorer(precision_score, average="weighted", zero_division=0.0)
    precision = precision_scorer(rfc, X_test, y_test)
    recall_scorer = make_scorer(recall_score, average="weighted")
    recall = recall_scorer(rfc, X_test, y_test)

    experiment.log_metric(name="precision", value=precision)
    experiment.log_metric(name="recall", value=recall)
    experiment.log_artifact(data_object=rfc, name=rfc.__class__.__name__, tags=["trained"])

Load experiments into the RubiconJSON class#

The RubiconJSON class accepts Projects, Experiments, and top-level Rubicon objects as an input. Once instantiated, the RubiconJSON class has a json property detailing each project and experiment. Let’s take a look at the representation of one of our experiments:

[5]:
from rubicon_ml import RubiconJSON

rubicon_json = RubiconJSON(experiments=project.experiments())
rubicon_json.json["experiment"][0]
[5]:
{'project_name': 'jsonpath querying',
 'id': '560c116e-0522-4ca9-acf9-d5fd6e5c9b44',
 'name': None,
 'description': None,
 'model_name': 'RandomForestClassifier',
 'branch_name': 'jsonpath',
 'commit_hash': 'c60285762eb792f76a8d60bfa1ce6e824cb94531',
 'training_metadata': None,
 'tags': [],
 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 164301),
 'feature': [{'name': 'alcohol',
   'id': 'bed69ee3-3af4-45ed-8955-bb3dca3693c7',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 164974)},
  {'name': 'malic_acid',
   'id': '7714ebf1-7679-47c0-9fe9-e8cbe330c8b0',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165062)},
  {'name': 'ash',
   'id': '9e63133e-4c35-4bb5-9173-78f17c9c92d3',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165121)},
  {'name': 'alcalinity_of_ash',
   'id': '9a8a7400-25a3-4651-af11-cc35846d1c04',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165178)},
  {'name': 'magnesium',
   'id': '98e42e9d-88fc-4fa4-a9bc-b614174cd45e',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165243)},
  {'name': 'total_phenols',
   'id': 'a59f41a1-b610-479a-bc36-c855393c3400',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165301)},
  {'name': 'flavanoids',
   'id': '2f80679b-7797-4adc-a856-e8e249baf96c',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165354)},
  {'name': 'nonflavanoid_phenols',
   'id': '0d1013ac-3b67-4997-840c-d83d961ddec7',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165406)},
  {'name': 'proanthocyanins',
   'id': '32288a47-f832-4757-ae22-2df712d05a1d',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165458)},
  {'name': 'color_intensity',
   'id': '1cdf37db-d8de-427f-bf50-ff55c5ad5a33',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165506)},
  {'name': 'hue',
   'id': '3fcabb4b-53f9-4ac9-8583-f49e1dfb18c3',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165557)},
  {'name': 'od280/od315_of_diluted_wines',
   'id': '24e7245c-c637-423c-ad38-1994d04421d1',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165605)},
  {'name': 'proline',
   'id': '4091d490-d4b8-425b-b2c7-b6001f017733',
   'description': None,
   'importance': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 165654)}],
 'parameter': [{'name': 'min_samples_leaf',
   'id': 'c0fef473-d83e-4e84-9c3f-0fe33a28c04b',
   'value': 1,
   'description': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 164750)},
  {'name': 'n_estimators',
   'id': '3ce88e6b-ddd1-4db8-9053-289b16663720',
   'value': 5,
   'description': None,
   'tags': [],
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 164862)}],
 'metric': [{'name': 'precision',
   'value': 0.9513333333333333,
   'id': '2c87e3c3-56cf-4dc7-b049-d497faab79e0',
   'description': None,
   'directionality': 'score',
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 172842),
   'tags': []},
  {'name': 'recall',
   'value': 0.95,
   'id': 'efd2e6c8-3cfc-4cca-943e-6bbad7a0c777',
   'description': None,
   'directionality': 'score',
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 172955),
   'tags': []}],
 'artifact': [{'name': 'RandomForestClassifier',
   'id': 'b94df717-0628-40c9-8042-d1341a6c7185',
   'description': None,
   'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 173251),
   'tags': ['trained'],
   'parent_id': '560c116e-0522-4ca9-acf9-d5fd6e5c9b44'}],
 'dataframe': []}

Query experiments with RubiconJSON.search#

Once created, we can use the RubiconJSON class to query our experiment metadata. We’ll start by getting each experiment that was tagged “large” during training.

[6]:
experiment_query = "$..experiment[?(@.tags[*]=='large')]"

for match in rubicon_json.search(experiment_query):
    print(match.value)
{'project_name': 'jsonpath querying', 'id': 'e818e604-5b6c-455b-951f-68d87db287d2', 'name': None, 'description': None, 'model_name': 'RandomForestClassifier', 'branch_name': 'jsonpath', 'commit_hash': 'c60285762eb792f76a8d60bfa1ce6e824cb94531', 'training_metadata': None, 'tags': ['large'], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 236186), 'feature': [{'name': 'alcohol', 'id': '010d78c9-648b-41c0-9bbb-6ea4c0098c9a', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 236763)}, {'name': 'malic_acid', 'id': '42f821b3-b4c9-459a-ad19-e31dcd861df2', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 236847)}, {'name': 'ash', 'id': '79111980-9804-4ae5-9819-0c7a0e30854d', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 236908)}, {'name': 'alcalinity_of_ash', 'id': '11579ad0-bd9b-4a89-a07f-98ee99d77c7e', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 236963)}, {'name': 'magnesium', 'id': 'a148b56a-9b6b-47fe-9c55-81a9ced61558', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237015)}, {'name': 'total_phenols', 'id': 'e599909f-3a4e-4b07-a134-0d271b6f54d7', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237068)}, {'name': 'flavanoids', 'id': 'bb26af85-8a8b-4fd0-81dc-c55408644055', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237119)}, {'name': 'nonflavanoid_phenols', 'id': '9a64ea5d-519e-4ec4-bdd1-24ec8e756c56', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237170)}, {'name': 'proanthocyanins', 'id': '33f3e24c-ca3e-49a8-a765-7131b9429e98', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237220)}, {'name': 'color_intensity', 'id': '793acaa7-a443-4cae-b5b5-3555628d095d', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237269)}, {'name': 'hue', 'id': '90ff3bb2-4c94-4ca4-a75f-b27ad1e31898', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237323)}, {'name': 'od280/od315_of_diluted_wines', 'id': 'e1ab9849-8c1a-4d86-8b09-f97cbaab1a34', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237374)}, {'name': 'proline', 'id': 'b6f55de3-4839-40e5-8cfc-9f88586961b3', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 237427)}], 'parameter': [{'name': 'min_samples_leaf', 'id': '7c1a20ff-42b1-4100-9515-c17d72f7e602', 'value': 1, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 236557)}, {'name': 'n_estimators', 'id': 'da82e2a1-fded-40b7-916b-0d3c06734b56', 'value': 50, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 236664)}], 'metric': [{'name': 'precision', 'value': 0.9684407096171803, 'id': 'dfe82079-3a29-4c7a-bc35-4d7d13f05b0b', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 264896), 'tags': []}, {'name': 'recall', 'value': 0.9666666666666667, 'id': '0886983e-74c9-44b1-ae98-7437fae22057', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 265031), 'tags': []}], 'artifact': [{'name': 'RandomForestClassifier', 'id': 'dbeea5f9-d2bc-41c0-aa20-21a737f3d4ba', 'description': None, 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 265855), 'tags': ['trained'], 'parent_id': 'e818e604-5b6c-455b-951f-68d87db287d2'}], 'dataframe': []}
{'project_name': 'jsonpath querying', 'id': '520c3905-b77b-4fe0-ac65-0fd639fb9e49', 'name': None, 'description': None, 'model_name': 'RandomForestClassifier', 'branch_name': 'jsonpath', 'commit_hash': 'c60285762eb792f76a8d60bfa1ce6e824cb94531', 'training_metadata': None, 'tags': ['large'], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 344533), 'feature': [{'name': 'alcohol', 'id': '15c197be-42e4-4f22-8f55-22491dc70b25', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345176)}, {'name': 'malic_acid', 'id': '8defd5b7-6b0c-490c-8f22-17885b004a11', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345270)}, {'name': 'ash', 'id': '9457c126-a9f4-470d-8b28-c96ae4728ea5', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345337)}, {'name': 'alcalinity_of_ash', 'id': '6dfbaeb2-3d26-4d4a-8606-a1ad157505bb', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345400)}, {'name': 'magnesium', 'id': '22324317-476f-4168-bd7b-18ba74ef3a97', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345462)}, {'name': 'total_phenols', 'id': '2020b6b1-1405-4dd2-9dda-b0df5d2d1ffe', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345524)}, {'name': 'flavanoids', 'id': '8fa210bd-266b-49be-bd3c-5a98382c555b', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345584)}, {'name': 'nonflavanoid_phenols', 'id': 'b4e6b701-ad61-420d-a7ec-0eeeece44664', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345643)}, {'name': 'proanthocyanins', 'id': '71e17e4f-8ca9-4fcd-8e19-4b06110701b1', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345703)}, {'name': 'color_intensity', 'id': '7d274784-08bc-46bb-a62d-3b18e40d88b8', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345763)}, {'name': 'hue', 'id': 'bc75d745-7568-47fd-9ab5-fd49d8a282bd', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345826)}, {'name': 'od280/od315_of_diluted_wines', 'id': '425705a9-8612-4857-8684-7f5617a7b768', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345889)}, {'name': 'proline', 'id': 'c0ac20d1-3513-4b1d-b863-8e35d644e2ad', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345954)}], 'parameter': [{'name': 'min_samples_leaf', 'id': 'ae114625-7626-4927-970d-24f2c69712a0', 'value': 1, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 344941)}, {'name': 'n_estimators', 'id': 'e5a2f37c-7be4-408c-8d6f-246e88a33945', 'value': 500, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 345069)}], 'metric': [{'name': 'precision', 'value': 0.9843137254901961, 'id': '08d9600b-8135-4073-9991-2fb121d90f16', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 579588), 'tags': []}, {'name': 'recall', 'value': 0.9833333333333333, 'id': '67edf163-b6f9-4f5c-b36f-1fdccd8af853', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 579752), 'tags': []}], 'artifact': [{'name': 'RandomForestClassifier', 'id': 'f9299a02-1108-4ffe-b1cb-1cf317736bba', 'description': None, 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 585425), 'tags': ['trained'], 'parent_id': '520c3905-b77b-4fe0-ac65-0fd639fb9e49'}], 'dataframe': []}
{'project_name': 'jsonpath querying', 'id': 'ef503b29-92d6-4e57-a79b-4a8d1894f72d', 'name': None, 'description': None, 'model_name': 'RandomForestClassifier', 'branch_name': 'jsonpath', 'commit_hash': 'c60285762eb792f76a8d60bfa1ce6e824cb94531', 'training_metadata': None, 'tags': ['large'], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 750197), 'feature': [{'name': 'alcohol', 'id': 'faa56976-7722-45a7-8de2-ec6c415c4fd4', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 750902)}, {'name': 'malic_acid', 'id': '619bad0e-9ca2-4475-a945-4df7129949de', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751025)}, {'name': 'ash', 'id': '1e3d1c34-d3d5-4e4d-a278-4a81260529ef', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751107)}, {'name': 'alcalinity_of_ash', 'id': '53b7526b-5f9e-42aa-bcff-a29b92adb63a', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751190)}, {'name': 'magnesium', 'id': '3246dc94-ae25-4a4b-8ded-2c98b7cdbc85', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751270)}, {'name': 'total_phenols', 'id': '065d3b37-155a-4987-b744-5e2d4fa4f738', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751350)}, {'name': 'flavanoids', 'id': 'b7a4f44a-05a1-4089-bb24-ff87c3a9dca1', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751431)}, {'name': 'nonflavanoid_phenols', 'id': 'e029320c-5b4b-4b12-86a8-73a5942f2dec', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751512)}, {'name': 'proanthocyanins', 'id': 'c55f8859-f192-4a01-b869-aa040c2da72a', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751592)}, {'name': 'color_intensity', 'id': 'c54dfe8f-112a-43c5-8f50-f2016cddc220', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751671)}, {'name': 'hue', 'id': '0ddc7fed-83cc-4ffa-a8df-936f9d62c1b2', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751752)}, {'name': 'od280/od315_of_diluted_wines', 'id': '7edd6019-d926-41ed-8762-d5e4f50a8089', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751830)}, {'name': 'proline', 'id': '5f5f36be-4b84-46b1-be75-62c35bdd8ba6', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 751912)}], 'parameter': [{'name': 'min_samples_leaf', 'id': '8e25faee-f2a3-4a46-a77b-46d7eeb2a207', 'value': 10, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 750614)}, {'name': 'n_estimators', 'id': 'c8f45afa-7c21-4d1a-a9f5-06256ea2dc0d', 'value': 50, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 750783)}], 'metric': [{'name': 'precision', 'value': 0.9502557544757034, 'id': 'a5d28065-9677-4e8a-9151-902ef22038a2', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 779443), 'tags': []}, {'name': 'recall', 'value': 0.95, 'id': '63b2cb5e-69cc-406b-a4e8-5fd52b0cd14f', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 779599), 'tags': []}], 'artifact': [{'name': 'RandomForestClassifier', 'id': '5a838ff1-edb7-450c-b0e9-6ae0609291d0', 'description': None, 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 780331), 'tags': ['trained'], 'parent_id': 'ef503b29-92d6-4e57-a79b-4a8d1894f72d'}], 'dataframe': []}
{'project_name': 'jsonpath querying', 'id': '2f7e7339-1a46-4b37-a18d-5de043ac28f3', 'name': None, 'description': None, 'model_name': 'RandomForestClassifier', 'branch_name': 'jsonpath', 'commit_hash': 'c60285762eb792f76a8d60bfa1ce6e824cb94531', 'training_metadata': None, 'tags': ['large'], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 870483), 'feature': [{'name': 'alcohol', 'id': '0c9f75c4-fb2f-414f-a782-00548b2931c2', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871190)}, {'name': 'malic_acid', 'id': '922b6346-59f5-40f4-b210-081829dc69bd', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871329)}, {'name': 'ash', 'id': 'dd67a63d-e597-4b30-a074-095694853d48', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871422)}, {'name': 'alcalinity_of_ash', 'id': '7d88be0a-bc96-4ee7-8f47-13021047c6fa', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871511)}, {'name': 'magnesium', 'id': 'cdd4b5f6-2272-403c-b0d8-24243e744c6c', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871595)}, {'name': 'total_phenols', 'id': 'efe47186-742e-43f7-836c-b3e0febf8bf2', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871687)}, {'name': 'flavanoids', 'id': '65392f8b-04bb-4737-9046-7a4709ed746c', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871770)}, {'name': 'nonflavanoid_phenols', 'id': '7d4194da-9ad6-4388-82b3-224622258fae', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871860)}, {'name': 'proanthocyanins', 'id': 'd7aab75d-afe3-4ff5-830a-1f3af96d9fa5', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871947)}, {'name': 'color_intensity', 'id': '438aefa7-731e-4001-9fe1-94edf2a6b093', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 872032)}, {'name': 'hue', 'id': '5d1ef085-977c-439e-b603-50c9c91ef8ed', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 872120)}, {'name': 'od280/od315_of_diluted_wines', 'id': 'dba520a8-353e-4193-956e-0f28b12098d2', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 872205)}, {'name': 'proline', 'id': '5204eba4-cfb2-4cc1-b8cc-fb73c829358d', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 872296)}], 'parameter': [{'name': 'min_samples_leaf', 'id': '5788d0a5-6b8d-487b-a743-26a2c2282885', 'value': 10, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 870888)}, {'name': 'n_estimators', 'id': '57aca1e1-5569-4745-b679-97098c43dc0a', 'value': 500, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 871062)}], 'metric': [{'name': 'precision', 'value': 0.9684407096171803, 'id': '5c0dd47d-ab4f-4990-a5b4-b30763f0b9df', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 96576), 'tags': []}, {'name': 'recall', 'value': 0.9666666666666667, 'id': 'be4cf2c9-5c03-4894-95e2-945ee1351f90', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 96747), 'tags': []}], 'artifact': [{'name': 'RandomForestClassifier', 'id': 'd931f828-e196-4745-a092-ea619560884e', 'description': None, 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 131815), 'tags': ['trained'], 'parent_id': '2f7e7339-1a46-4b37-a18d-5de043ac28f3'}], 'dataframe': []}
{'project_name': 'jsonpath querying', 'id': 'effc91cd-ad6f-420e-9654-9ba691d744ff', 'name': None, 'description': None, 'model_name': 'RandomForestClassifier', 'branch_name': 'jsonpath', 'commit_hash': 'c60285762eb792f76a8d60bfa1ce6e824cb94531', 'training_metadata': None, 'tags': ['large'], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 306637), 'feature': [{'name': 'alcohol', 'id': 'f62bd7cd-ec9c-446d-b6e4-ae30d5210d70', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 307470)}, {'name': 'malic_acid', 'id': '7aa60cd4-4e18-4fa3-90ea-8c4688fbcd9e', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 307640)}, {'name': 'ash', 'id': '45652ac4-3461-48fb-b2dd-2136fd778ee2', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 307744)}, {'name': 'alcalinity_of_ash', 'id': '3f293418-9b47-4aec-a494-16db9d185785', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 307851)}, {'name': 'magnesium', 'id': '7ccd8fae-de4b-468e-b821-21f897239b40', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 307952)}, {'name': 'total_phenols', 'id': '4b036fa2-266c-4f44-90c7-b2e1b6b25269', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308052)}, {'name': 'flavanoids', 'id': '1aafb3c1-a117-4925-a38e-4b91c2c9bf9c', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308151)}, {'name': 'nonflavanoid_phenols', 'id': '980ae477-bc9f-4af6-8b9a-ab07da862551', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308249)}, {'name': 'proanthocyanins', 'id': '1f2f977a-6d66-4b39-9b88-f5a2b539fe25', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308346)}, {'name': 'color_intensity', 'id': '78c9eaae-6e87-4d90-ad14-b14a8d9c5621', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308442)}, {'name': 'hue', 'id': '9dbd7c25-c927-4194-aacf-d64b92c5fb12', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308539)}, {'name': 'od280/od315_of_diluted_wines', 'id': 'e8743821-d126-47fa-aead-0d768df493c8', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308635)}, {'name': 'proline', 'id': 'e900df94-2c39-49f0-8f35-9d1921937ec2', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 308738)}], 'parameter': [{'name': 'min_samples_leaf', 'id': 'a7295ff0-2af4-43c2-b169-3bb711b29c57', 'value': 100, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 307126)}, {'name': 'n_estimators', 'id': '03cc53a9-34e1-4a4c-b483-49f0ff59328d', 'value': 50, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 307321)}], 'metric': [{'name': 'precision', 'value': 0.16000000000000003, 'id': 'cdaa4982-df37-48d3-adc9-9bc2af70b910', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 334829), 'tags': []}, {'name': 'recall', 'value': 0.4, 'id': 'c468eb77-6abd-41e3-acd1-e4aa78916c4e', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 334998), 'tags': []}], 'artifact': [{'name': 'RandomForestClassifier', 'id': 'dd5e09e7-a82b-4873-9509-65060e725d58', 'description': None, 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 335750), 'tags': ['trained'], 'parent_id': 'effc91cd-ad6f-420e-9654-9ba691d744ff'}], 'dataframe': []}
{'project_name': 'jsonpath querying', 'id': '0eb7ec0c-61d0-4e8e-a662-e03eedad959d', 'name': None, 'description': None, 'model_name': 'RandomForestClassifier', 'branch_name': 'jsonpath', 'commit_hash': 'c60285762eb792f76a8d60bfa1ce6e824cb94531', 'training_metadata': None, 'tags': ['large'], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 416993), 'feature': [{'name': 'alcohol', 'id': '84fe1647-4e00-4fd6-bc22-6f8ade193824', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 417745)}, {'name': 'malic_acid', 'id': '7a1611d4-a515-4cc2-9121-51fec8a4ab18', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 417905)}, {'name': 'ash', 'id': '6b6d5fa3-e7f0-43a5-a66e-551d5b217763', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418006)}, {'name': 'alcalinity_of_ash', 'id': 'cd111c59-a5e2-4524-83ed-76fc2fd41f94', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418106)}, {'name': 'magnesium', 'id': '78fdae70-81f0-4c7a-b2d7-e4ebfcf4aafc', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418203)}, {'name': 'total_phenols', 'id': 'dd8e57c8-aea1-49ca-a8d7-0d993e730388', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418299)}, {'name': 'flavanoids', 'id': 'c0e4ff3b-0e99-4c9f-9d1a-422e0a3be042', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418395)}, {'name': 'nonflavanoid_phenols', 'id': '56e107de-f1a2-4802-8436-cb7a50843297', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418491)}, {'name': 'proanthocyanins', 'id': '1a3470ca-b181-4e54-acaf-2618154a76eb', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418587)}, {'name': 'color_intensity', 'id': '18acd547-3bc7-4e32-b8a2-aad3e087c151', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418680)}, {'name': 'hue', 'id': '4c2726e4-5606-454c-b9ae-f0409f0b58b5', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418787)}, {'name': 'od280/od315_of_diluted_wines', 'id': 'aeec0f25-4cfb-4bf5-a129-486b95cf8395', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418881)}, {'name': 'proline', 'id': '191c8e1d-7ddd-4ef4-b109-20dbeab296c9', 'description': None, 'importance': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 418982)}], 'parameter': [{'name': 'min_samples_leaf', 'id': 'd03ede0e-4062-4da8-99b8-2888f59f2ecd', 'value': 100, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 417419)}, {'name': 'n_estimators', 'id': '6a10da8d-78e2-42b7-a0cc-b3d0b11ebfbe', 'value': 500, 'description': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 417606)}], 'metric': [{'name': 'precision', 'value': 0.16000000000000003, 'id': 'cf4db64f-f391-4015-a647-0ee6f71c52e6', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 627515), 'tags': []}, {'name': 'recall', 'value': 0.4, 'id': '60a8b8a1-e53c-46ea-aa0b-dd2efdb6263e', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 627702), 'tags': []}], 'artifact': [{'name': 'RandomForestClassifier', 'id': 'f336587d-2ed0-49c1-a354-2edfde3f985c', 'description': None, 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 632643), 'tags': ['trained'], 'parent_id': '0eb7ec0c-61d0-4e8e-a662-e03eedad959d'}], 'dataframe': []}

We can access any attribute of the queried objects within the query as well. Let’s just get the ID’s of those experiments from the last cell.

[7]:
experiment_query += ".id"

for match in rubicon_json.search(experiment_query):
    print(match.value)
e818e604-5b6c-455b-951f-68d87db287d2
520c3905-b77b-4fe0-ac65-0fd639fb9e49
ef503b29-92d6-4e57-a79b-4a8d1894f72d
2f7e7339-1a46-4b37-a18d-5de043ac28f3
effc91cd-ad6f-420e-9654-9ba691d744ff
0eb7ec0c-61d0-4e8e-a662-e03eedad959d

Now, let’s get all the metrics from every experiment:

[8]:
metric_query = "$..experiment[*].metric"

for match in rubicon_json.search(metric_query):
    print(match.value)
[{'name': 'precision', 'value': 0.9513333333333333, 'id': '2c87e3c3-56cf-4dc7-b049-d497faab79e0', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 172842), 'tags': []}, {'name': 'recall', 'value': 0.95, 'id': 'efd2e6c8-3cfc-4cca-943e-6bbad7a0c777', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 172955), 'tags': []}]
[{'name': 'precision', 'value': 0.9684407096171803, 'id': 'dfe82079-3a29-4c7a-bc35-4d7d13f05b0b', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 264896), 'tags': []}, {'name': 'recall', 'value': 0.9666666666666667, 'id': '0886983e-74c9-44b1-ae98-7437fae22057', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 265031), 'tags': []}]
[{'name': 'precision', 'value': 0.9843137254901961, 'id': '08d9600b-8135-4073-9991-2fb121d90f16', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 579588), 'tags': []}, {'name': 'recall', 'value': 0.9833333333333333, 'id': '67edf163-b6f9-4f5c-b36f-1fdccd8af853', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 579752), 'tags': []}]
[{'name': 'precision', 'value': 0.9544973544973545, 'id': 'fc38af63-597b-4021-9e98-7ef4adfb0027', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 679004), 'tags': []}, {'name': 'recall', 'value': 0.95, 'id': 'ddd027e5-289a-4f1e-9639-c7f91d8ce387', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 679151), 'tags': []}]
[{'name': 'precision', 'value': 0.9502557544757034, 'id': 'a5d28065-9677-4e8a-9151-902ef22038a2', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 779443), 'tags': []}, {'name': 'recall', 'value': 0.95, 'id': '63b2cb5e-69cc-406b-a4e8-5fd52b0cd14f', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 779599), 'tags': []}]
[{'name': 'precision', 'value': 0.9684407096171803, 'id': '5c0dd47d-ab4f-4990-a5b4-b30763f0b9df', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 96576), 'tags': []}, {'name': 'recall', 'value': 0.9666666666666667, 'id': 'be4cf2c9-5c03-4894-95e2-945ee1351f90', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 96747), 'tags': []}]
[{'name': 'precision', 'value': 0.16000000000000003, 'id': '28c61d9f-d387-410c-9020-51f82ed09ea8', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 222257), 'tags': []}, {'name': 'recall', 'value': 0.4, 'id': 'a7eb39c8-4b54-4745-895b-d16b1ebe0a27', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 222430), 'tags': []}]
[{'name': 'precision', 'value': 0.16000000000000003, 'id': 'cdaa4982-df37-48d3-adc9-9bc2af70b910', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 334829), 'tags': []}, {'name': 'recall', 'value': 0.4, 'id': 'c468eb77-6abd-41e3-acd1-e4aa78916c4e', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 334998), 'tags': []}]
[{'name': 'precision', 'value': 0.16000000000000003, 'id': 'cf4db64f-f391-4015-a647-0ee6f71c52e6', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 627515), 'tags': []}, {'name': 'recall', 'value': 0.4, 'id': '60a8b8a1-e53c-46ea-aa0b-dd2efdb6263e', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 627702), 'tags': []}]

Some of those precision scores are a lot better than others - let’s just get the really high ones.

[9]:
best_metric_query = "$..experiment[*].metric[?(@.name=='precision' & @.value>=0.96)]"

for match in rubicon_json.search(best_metric_query):
    print(match.value)
{'name': 'precision', 'value': 0.9684407096171803, 'id': 'dfe82079-3a29-4c7a-bc35-4d7d13f05b0b', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 264896), 'tags': []}
{'name': 'precision', 'value': 0.9843137254901961, 'id': '08d9600b-8135-4073-9991-2fb121d90f16', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 33, 579588), 'tags': []}
{'name': 'precision', 'value': 0.9684407096171803, 'id': '5c0dd47d-ab4f-4990-a5b4-b30763f0b9df', 'description': None, 'directionality': 'score', 'created_at': datetime.datetime(2023, 9, 23, 21, 52, 34, 96576), 'tags': []}

We can retrieve the ID’s of the experiments those metrics belong to for further exploration.

[10]:
best_experiment_query = "$..experiment[?(@.metric[?(@.name=='precision' & @.value>=0.96)])].id"

for match in rubicon_json.search(best_experiment_query):
    print(match.value)
e818e604-5b6c-455b-951f-68d87db287d2
520c3905-b77b-4fe0-ac65-0fd639fb9e49
2f7e7339-1a46-4b37-a18d-5de043ac28f3

We can use the IDs to retrieve rubicon-ml experiments and dig deeper into the metadata.

[11]:
for match in rubicon_json.search(best_experiment_query):
    experiment = project.experiment(id=match.value)

    print(experiment.artifact(name="RandomForestClassifier").get_data(unpickle=True))
RandomForestClassifier(n_estimators=50, random_state=0)
RandomForestClassifier(n_estimators=500, random_state=0)
RandomForestClassifier(min_samples_leaf=10, n_estimators=500, random_state=0)