View this notebook on GitHub or run it yourself on Binder!


Log with Multiple Backends

rubicon-ml allows users to instantiate Rubicon objects with multiple backends to write to/read from at once. These backends include local, memory, and S3 repositories. Here’s a walk through of how one might instantiate and use a Rubicon object with multiple backends.

[1]:
from rubicon_ml import Rubicon

Let’s say we want to log to two separate locations on our local filesystem. This example is a bit contrived, but you could imagine writing to both a local filesystem for quick, ad-hoc exploration and an S3 bucket for persistent storage.

[2]:
rubicon_composite = Rubicon(composite_config=[
    {"persistence": "filesystem", "root_dir": "./rubicon-root/root_a"},
    {"persistence": "filesystem", "root_dir": "./rubicon-root/root_b"},
])

Writing

All of rubicon-ml’s logging functions will now log to both locations in the filesystem with a single function call.

[3]:
import pandas as pd

project_composite = rubicon_composite.create_project(name="multiple backends")
experiment_composite = project_composite.log_experiment()

feature = experiment_composite.log_feature(name="year")
metric = experiment_composite.log_metric(name="accuracy", value=1.0)
parameter = experiment_composite.log_parameter(name="n_estimators", value=100)
artifact = experiment_composite.log_artifact(
    data_bytes=b"bytes", name="example artifact"
)
dataframe = experiment_composite.log_dataframe(
    pd.DataFrame([[5, 0, 0], [0, 5, 1], [0, 0, 4]], columns=["x", "y", "z"]),
    name="example dataframe",
)

experiment_composite.id
[3]:
'8abfbff9-a9a1-46de-b782-3bb4ad1c41a0'

Let’s verify both of our backends have been written to by retrieving the data one location at a time.

[4]:
rubicon_a = Rubicon(persistence="filesystem", root_dir="./rubicon-root/root_a")
project_a = rubicon_a.get_project(name="multiple backends")

project_a.experiments()[0].id
[4]:
'8abfbff9-a9a1-46de-b782-3bb4ad1c41a0'

Each experiments’ IDs match, confirming they are the same.

[5]:
rubicon_b = Rubicon(persistence="filesystem", root_dir="./rubicon-root/root_b")
project_b = rubicon_a.get_project(name="multiple backends")

project_b.experiments()[0].id
[5]:
'8abfbff9-a9a1-46de-b782-3bb4ad1c41a0'

Reading

rubicon-ml’s reading functions will iterate over all backend repositories and return from the first one they are able to read from. A RubiconException will be raised if none of the backend repositories can be read the requested item(s).

[6]:
project_read = rubicon_composite.get_project(name="multiple backends")
project_read
[6]:
<rubicon_ml.client.project.Project at 0x16aeb83e0>
[7]:
for experiment in project_read.experiments():
    print(f"features: {[f.name for f in experiment.features()]}")
    print(f"metrics: {[m.name for m in experiment.metrics()]}")
    print(f"parameters: {[p.name for p in experiment.parameters()]}")
    print(f"artifact data: {experiment.artifact(name='example artifact').get_data()}")
    print(f"dataframe data:\n{experiment.dataframe(name='example dataframe').get_data()}")
features: ['year']
metrics: ['accuracy']
parameters: ['n_estimators']
artifact data: b'bytes'
dataframe data:
   x  y  z
0  5  0  0
1  0  5  1
2  0  0  4