View this notebook on GitHub or run it yourself on Binder!


Tagging

Tags can be used to group and indentify specific rubicon-ml entities by shared characteristics. Any rubicon-ml entity can be tagged when logged with any number of tags. Later, tags can be leveraged to query rubicon-ml logs during retrieval.

In general, a tag is any arbitrary string. rubicon-ml provides additonal functionality for tags that follow a <key>:<value> format.

Logging with tags

First, create a Rubicon entrypoint.

[1]:
from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="memory")
project = rubicon.create_project("tagging")

Now we’ll log three experiments with tags “a” and “b”.

[2]:
experiment_a = project.log_experiment(tags=["tag_a"])
experiment_b = project.log_experiment(tags=["other_tag_a", "tag_b"])
experiment_c = project.log_experiment(tags=["tag_a", "tag_b"])

print(f"`experiment_a` ID: {experiment_a.id}, tags: {experiment_a.tags}")
print(f"`experiment_b` ID: {experiment_b.id}, tags: {experiment_b.tags}")
print(f"`experiment_c` ID: {experiment_c.id}, tags: {experiment_c.tags}")
`experiment_a` ID: 7e08c0bf-7f88-46d1-89de-4e0da7e2a448, tags: ['tag_a']
`experiment_b` ID: 75e54061-5910-43a0-a036-c5a6bdd77ca1, tags: ['other_tag_a', 'tag_b']
`experiment_c` ID: cc09ea5c-18df-48b1-888e-e692f5d9e71a, tags: ['tag_a', 'tag_b']

Any other entity logged to an experiment can also be tagged.

[3]:
import pandas as pd

artifact = experiment_a.log_artifact(
    data_bytes=b"artifact", name="artifact", tags=["tag_c"]
)
dataframe = experiment_a.log_dataframe(
    df=pd.DataFrame([[0], [1]]), tags=["tag_d"]
)
feature = experiment_a.log_feature(name="var_0", tags=["tag_e"])
parameter = experiment_a.log_parameter(name="input", value=0, tags=["tag_f"])
metric = experiment_a.log_metric(name="output", value=1, tags=["tag_g"])

Retrieving with tags

Each of the retrieval functions on a project or experiment (experiments, metrics, etc.) accept the tags and qtype (“or” or “and”, default “or”) arguments to filter retrieval.

First, grab all the experiments with tag “a”.

[4]:
[f"{e.id}: {e.tags}" for e in project.experiments(tags=["tag_a"])]
[4]:
["7e08c0bf-7f88-46d1-89de-4e0da7e2a448: ['tag_a']",
 "cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']"]

Next, get each experiment with tag “b”. Note that the final experiment is the same as the last output since it has both tags “a” and “b”.

[5]:
[f"{e.id}: {e.tags}" for e in project.experiments(tags=["tag_b"])]
[5]:
["75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']",
 "cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']"]

Querying with multiple tags uses a logical or to return results by default.

[6]:
[f"{e.id}: {e.tags}" for e in project.experiments(tags=["tag_a", "tag_b"])]
[6]:
["7e08c0bf-7f88-46d1-89de-4e0da7e2a448: ['tag_a']",
 "75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']",
 "cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']"]

This can be switched to a logical and with the qtype argument.

[7]:
[f"{e.id}: {e.tags}" for e in project.experiments(tags=["tag_a", "tag_b"], qtype="and")]
[7]:
["cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']"]

Wildcards

Retrieval by tags also supports wildcards (*) while querying.

[8]:
[f"{e.id}: {e.tags}" for e in project.experiments(tags=["*_a"])]
[8]:
["7e08c0bf-7f88-46d1-89de-4e0da7e2a448: ['tag_a']",
 "75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']",
 "cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']"]

Multiple wildcards can be used in a single query. A single wildcard character will match any number of characters in the tag.

[9]:
[f"{e.id}: {e.tags}" for e in project.experiments(tags=["*_*_*"])]
[9]:
["75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']"]

Updating tags

Tags can be update later, after logging as well.

[10]:
experiment_c.tags
[10]:
['tag_a', 'tag_b']

add_tags adds any number of new tags to an existing entity. Each entity that allows tagging will have both the add_tags and remove_tags functions.

[11]:
experiment_c.add_tags(["tag_h", "tag_i"])
experiment_c.tags
[11]:
['tag_i', 'tag_h', 'tag_b', 'tag_a']

Removal works similarly.

[12]:
experiment_c.remove_tags(["tag_a", "tag_b"])
experiment_c.tags
[12]:
['tag_i', 'tag_h']

Now, the same query from above for an experiment with tags “a” and “b” returns no results.

[13]:
[f"{e.id}: {e.tags}" for e in project.experiments(tags=["tag_a", "tag_b"], qtype="and")]
[13]:
[]

Key-value tags

rubicon-ml provides extended support for tags that follow the <key>:<value> format.

[14]:
experiment_d = project.log_experiment(tags=["tag_j:k"])
experiment_e = project.log_experiment(tags=["tag_j:l", "tag_m:n", "tag_m:o"])

The list returned by the tags property of any entity can be indexed into like a regular list to retrieve the full tags, just like with normal tags.

[15]:
experiment_d.tags[0]
[15]:
'tag_j:k'

But it also supports string indexing, like a dictionary. To retrieve the value of a key-value tag, just index into the tags property with its key.

[16]:
experiment_d.tags["tag_j"]
[16]:
'k'

If there are multiple keys, a list containing each value will be returned.

[17]:
experiment_e.tags["tag_m"]
[17]:
['n', 'o']

Combine key-value tags and wildcards to examine the value of “tag_j” on every experiment that has one.

[18]:
[f"{e.id}: {e.tags['tag_j']}" for e in project.experiments(tags=["tag_j:*"])]
[18]:
['0962a69f-1db5-4f42-884c-60ad179bdb5c: k',
 'd5c9d775-e546-4b02-93ac-f5cee6d17aea: l']

Managing experiment relationships

A common use for key-value tags is managing relationships between experiments. rubicon-ml has built-in support for managing such relationships in this manner.

[19]:
experiment_a.add_child_experiment(experiment_d)
experiment_a.add_child_experiment(experiment_e)

experiment_a.tags
[19]:
['child:0962a69f-1db5-4f42-884c-60ad179bdb5c',
 'child:d5c9d775-e546-4b02-93ac-f5cee6d17aea',
 'tag_a']

Now let’s say we’ve only been given experiment_a and we don’t know anything about its children or how they were logged.

The child experiment IDs themselves can be retrieved by indexing into the tags with the “child” key.

[20]:
experiment_a.tags["child"]
[20]:
['0962a69f-1db5-4f42-884c-60ad179bdb5c',
 'd5c9d775-e546-4b02-93ac-f5cee6d17aea']

From there, we can use the IDs grab the complete child experiments from the original project.

[21]:
[project.experiment(id=exp_id) for exp_id in experiment_a.tags["child"]]
[21]:
[<rubicon_ml.client.experiment.Experiment at 0x163cad2a0>,
 <rubicon_ml.client.experiment.Experiment at 0x163cad030>]