{ "cells": [ { "cell_type": "markdown", "id": "0f32303d-101b-48cc-9e18-3c7466b3ea2f", "metadata": {}, "source": [ "# Tagging\n", "\n", "Tags can be used to group and indentify specific rubicon-ml entities by shared characteristics.\n", "Any rubicon-ml entity can be tagged when logged with any number of tags. Later, tags can be leveraged\n", "to query rubicon-ml logs during retrieval.\n", "\n", "In general, a tag is any arbitrary string. rubicon-ml provides additonal functionality for tags that\n", "follow a ``:`` format.\n", "\n", "## Logging with tags\n", "\n", "First, create a ``Rubicon`` entrypoint." ] }, { "cell_type": "code", "execution_count": 1, "id": "10393959-57dd-4a3d-8b1e-d7fffb24b421", "metadata": {}, "outputs": [], "source": [ "from rubicon_ml import Rubicon\n", "\n", "rubicon = Rubicon(persistence=\"memory\")\n", "project = rubicon.create_project(\"tagging\")" ] }, { "cell_type": "markdown", "id": "67307879-6526-4382-a503-ee94c49e9e74", "metadata": {}, "source": [ "Now we'll log three experiments with tags \"a\" and \"b\"." ] }, { "cell_type": "code", "execution_count": 2, "id": "ae4c1a0b-6172-4250-ba55-c745d84c0813", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "`experiment_a` ID: 7e08c0bf-7f88-46d1-89de-4e0da7e2a448, tags: ['tag_a']\n", "`experiment_b` ID: 75e54061-5910-43a0-a036-c5a6bdd77ca1, tags: ['other_tag_a', 'tag_b']\n", "`experiment_c` ID: cc09ea5c-18df-48b1-888e-e692f5d9e71a, tags: ['tag_a', 'tag_b']\n" ] } ], "source": [ "experiment_a = project.log_experiment(tags=[\"tag_a\"])\n", "experiment_b = project.log_experiment(tags=[\"other_tag_a\", \"tag_b\"])\n", "experiment_c = project.log_experiment(tags=[\"tag_a\", \"tag_b\"])\n", "\n", "print(f\"`experiment_a` ID: {experiment_a.id}, tags: {experiment_a.tags}\")\n", "print(f\"`experiment_b` ID: {experiment_b.id}, tags: {experiment_b.tags}\")\n", "print(f\"`experiment_c` ID: {experiment_c.id}, tags: {experiment_c.tags}\")" ] }, { "cell_type": "markdown", "id": "df6289a5-4681-44df-953a-f1c1f0d39b87", "metadata": {}, "source": [ "Any other entity logged to an experiment can also be tagged." ] }, { "cell_type": "code", "execution_count": 3, "id": "71913363-5b98-4e95-9cae-eafa05793a2c", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "artifact = experiment_a.log_artifact(\n", " data_bytes=b\"artifact\", name=\"artifact\", tags=[\"tag_c\"]\n", ")\n", "dataframe = experiment_a.log_dataframe(\n", " df=pd.DataFrame([[0], [1]]), tags=[\"tag_d\"]\n", ")\n", "feature = experiment_a.log_feature(name=\"var_0\", tags=[\"tag_e\"])\n", "parameter = experiment_a.log_parameter(name=\"input\", value=0, tags=[\"tag_f\"])\n", "metric = experiment_a.log_metric(name=\"output\", value=1, tags=[\"tag_g\"])" ] }, { "cell_type": "markdown", "id": "179f8230-414f-4881-9dc9-7b42f3de23b1", "metadata": {}, "source": [ "## Retrieving with tags\n", "\n", "Each of the retrieval functions on a project or experiment (``experiments``, ``metrics``, etc.)\n", "accept the ``tags`` and ``qtype`` (\"or\" or \"and\", default \"or\") arguments to filter retrieval.\n", "\n", "First, grab all the experiments with tag \"a\"." ] }, { "cell_type": "code", "execution_count": 4, "id": "8db675d1-b5c1-4471-8f27-ad4741f8f130", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\"7e08c0bf-7f88-46d1-89de-4e0da7e2a448: ['tag_a']\",\n", " \"cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']\"]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags}\" for e in project.experiments(tags=[\"tag_a\"])]" ] }, { "cell_type": "markdown", "id": "59118c51-be5f-4d8a-89be-c18dcbf411eb", "metadata": {}, "source": [ "Next, get each experiment with tag \"b\". Note that the final experiment is the same as the last\n", "output since it has both tags \"a\" and \"b\"." ] }, { "cell_type": "code", "execution_count": 5, "id": "e3e93248-30f5-4c60-bdea-752b25cc9057", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\"75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']\",\n", " \"cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']\"]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags}\" for e in project.experiments(tags=[\"tag_b\"])]" ] }, { "cell_type": "markdown", "id": "ae7007b0-425f-45f0-8fa2-675ecafedb80", "metadata": {}, "source": [ "Querying with multiple tags uses a logical _or_ to return results by default." ] }, { "cell_type": "code", "execution_count": 6, "id": "e5664866-bef5-48a0-acc3-281d40a618d6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\"7e08c0bf-7f88-46d1-89de-4e0da7e2a448: ['tag_a']\",\n", " \"75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']\",\n", " \"cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']\"]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags}\" for e in project.experiments(tags=[\"tag_a\", \"tag_b\"])]" ] }, { "cell_type": "markdown", "id": "03298720-9017-4314-8c86-855ab8be8e32", "metadata": {}, "source": [ "This can be switched to a logical _and_ with the ``qtype`` argument." ] }, { "cell_type": "code", "execution_count": 7, "id": "76dec4ef-8ced-4f5a-945c-6a0e88ea8f42", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\"cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']\"]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags}\" for e in project.experiments(tags=[\"tag_a\", \"tag_b\"], qtype=\"and\")]" ] }, { "cell_type": "markdown", "id": "6099dacf-51f1-4f08-98e1-a62d6e19d7f9", "metadata": {}, "source": [ "### Wildcards\n", "\n", "Retrieval by tags also supports wildcards (``*``) while querying." ] }, { "cell_type": "code", "execution_count": 8, "id": "8a3ee1b5-0340-481a-8d9e-151caff8e7f8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\"7e08c0bf-7f88-46d1-89de-4e0da7e2a448: ['tag_a']\",\n", " \"75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']\",\n", " \"cc09ea5c-18df-48b1-888e-e692f5d9e71a: ['tag_a', 'tag_b']\"]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags}\" for e in project.experiments(tags=[\"*_a\"])]" ] }, { "cell_type": "markdown", "id": "a5339a13-f08e-4680-be9b-0bb2bf68607c", "metadata": {}, "source": [ "Multiple wildcards can be used in a single query. A single wildcard character will match any number of\n", "characters in the tag." ] }, { "cell_type": "code", "execution_count": 9, "id": "bb81395a-1b9a-48f9-887f-67f549813226", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\"75e54061-5910-43a0-a036-c5a6bdd77ca1: ['other_tag_a', 'tag_b']\"]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags}\" for e in project.experiments(tags=[\"*_*_*\"])]" ] }, { "cell_type": "markdown", "id": "42e9652c-9264-4cdd-b230-0fd19345100e", "metadata": {}, "source": [ "## Updating tags\n", "\n", "Tags can be update later, after logging as well." ] }, { "cell_type": "code", "execution_count": 10, "id": "1fd16d81-e94c-408e-9275-623a35f48af8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['tag_a', 'tag_b']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_c.tags" ] }, { "cell_type": "markdown", "id": "f0564014-5aa0-40ca-861d-9ff485e95b8c", "metadata": {}, "source": [ "`add_tags` adds any number of new tags to an existing entity. Each entity that allows\n", "tagging will have both the ``add_tags`` and ``remove_tags`` functions." ] }, { "cell_type": "code", "execution_count": 11, "id": "b8c55a1d-9955-4bc2-9384-117b3444a968", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['tag_i', 'tag_h', 'tag_b', 'tag_a']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_c.add_tags([\"tag_h\", \"tag_i\"])\n", "experiment_c.tags" ] }, { "cell_type": "markdown", "id": "7f4fe05e-d4cf-41ad-b3bc-ec78e0207c6b", "metadata": {}, "source": [ "Removal works similarly." ] }, { "cell_type": "code", "execution_count": 12, "id": "4fe6eba0-d58e-4bb9-81ed-886bfc611506", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['tag_i', 'tag_h']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_c.remove_tags([\"tag_a\", \"tag_b\"])\n", "experiment_c.tags" ] }, { "cell_type": "markdown", "id": "7815fb91-f310-415c-9590-d26d0c9fbda8", "metadata": {}, "source": [ "Now, the same query from above for an experiment with tags \"a\" and \"b\" returns no results." ] }, { "cell_type": "code", "execution_count": 13, "id": "a9be6033-775c-4d11-b9d6-55364474eb87", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags}\" for e in project.experiments(tags=[\"tag_a\", \"tag_b\"], qtype=\"and\")]" ] }, { "cell_type": "markdown", "id": "1394c855-6fd4-426b-9fce-dae44caee95b", "metadata": {}, "source": [ "## Key-value tags\n", "\n", "rubicon-ml provides extended support for tags that follow the ``:`` format." ] }, { "cell_type": "code", "execution_count": 14, "id": "d68b29d3-436b-476b-9cc3-cf8d4c7e14ac", "metadata": {}, "outputs": [], "source": [ "experiment_d = project.log_experiment(tags=[\"tag_j:k\"])\n", "experiment_e = project.log_experiment(tags=[\"tag_j:l\", \"tag_m:n\", \"tag_m:o\"])" ] }, { "cell_type": "markdown", "id": "aa040516-3cbf-4dbd-8100-0f0f6f4d23b6", "metadata": {}, "source": [ "The list returned by the `tags` property of any entity can be indexed into like a\n", "regular list to retrieve the full tags, just like with normal tags." ] }, { "cell_type": "code", "execution_count": 15, "id": "3b9ff6f5-b90b-4870-8ffc-0ffe74a8c0f8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'tag_j:k'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_d.tags[0]" ] }, { "cell_type": "markdown", "id": "cc7f9837-bedf-48d8-ac18-2479a0bc2df4", "metadata": {}, "source": [ "But it also supports string indexing, like a dictionary. To retrieve the value of a\n", "key-value tag, just index into the `tags` property with its key." ] }, { "cell_type": "code", "execution_count": 16, "id": "622925d8-26fd-4961-b7d5-0625544d736c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'k'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_d.tags[\"tag_j\"]" ] }, { "cell_type": "markdown", "id": "726f8cd6-2f8c-44d4-a264-a7106bda49e0", "metadata": {}, "source": [ "If there are multiple keys, a list containing each value will be returned." ] }, { "cell_type": "code", "execution_count": 17, "id": "07328cb3-9947-4fe9-8359-244530016a7e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['n', 'o']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_e.tags[\"tag_m\"]" ] }, { "cell_type": "markdown", "id": "a85a3307-eb62-4bd1-b0eb-9367f6edf164", "metadata": {}, "source": [ "Combine key-value tags and wildcards to examine the value of _\"tag_j\"_ on every experiment that has one." ] }, { "cell_type": "code", "execution_count": 18, "id": "334e5096-b816-4981-b6c2-6e741a323a3e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['0962a69f-1db5-4f42-884c-60ad179bdb5c: k',\n", " 'd5c9d775-e546-4b02-93ac-f5cee6d17aea: l']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{e.id}: {e.tags['tag_j']}\" for e in project.experiments(tags=[\"tag_j:*\"])]" ] }, { "cell_type": "markdown", "id": "26f96aa5-cc31-4adf-86c5-d1b608741259", "metadata": {}, "source": [ "### Managing experiment relationships\n", "\n", "A common use for key-value tags is managing relationships between experiments. rubicon-ml\n", "has built-in support for managing such relationships in this manner." ] }, { "cell_type": "code", "execution_count": 19, "id": "ab2c5a29-b8ee-45e7-b4b9-6dc22c3abdd7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['child:0962a69f-1db5-4f42-884c-60ad179bdb5c',\n", " 'child:d5c9d775-e546-4b02-93ac-f5cee6d17aea',\n", " 'tag_a']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_a.add_child_experiment(experiment_d)\n", "experiment_a.add_child_experiment(experiment_e)\n", "\n", "experiment_a.tags" ] }, { "cell_type": "markdown", "id": "7c3367cc-77f3-4c20-b7ab-5029e64b8452", "metadata": {}, "source": [ "Now let's say we've only been given `experiment_a` and we don't know anything about its children or how they were logged.\n", "\n", "The child experiment IDs themselves can be retrieved by indexing into the tags with the \"child\" key." ] }, { "cell_type": "code", "execution_count": 20, "id": "b3c04d92-6f52-4c19-ab0e-6907b4f8ab17", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['0962a69f-1db5-4f42-884c-60ad179bdb5c',\n", " 'd5c9d775-e546-4b02-93ac-f5cee6d17aea']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_a.tags[\"child\"]" ] }, { "cell_type": "markdown", "id": "5aae42a9-2ebd-4444-98ff-0127f3ed737c", "metadata": {}, "source": [ "From there, we can use the IDs grab the complete child experiments from the original project." ] }, { "cell_type": "code", "execution_count": 21, "id": "876a3358-2277-47b4-910f-1a1db89bda9d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[,\n", " ]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[project.experiment(id=exp_id) for exp_id in experiment_a.tags[\"child\"]]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }