{ "cells": [ { "cell_type": "markdown", "id": "d39383f7", "metadata": {}, "source": [ "# Log with Multiple Backends\n", "\n", "rubicon-ml allows users to instantiate `Rubicon` objects with multiple backends to write to/read from at once. These backends include local, memory, and S3 repositories. Here's a walk through of how one might instantiate and use a `Rubicon` object with multiple backends." ] }, { "cell_type": "code", "execution_count": 1, "id": "7e1d5aaa", "metadata": {}, "outputs": [], "source": [ "from rubicon_ml import Rubicon" ] }, { "cell_type": "markdown", "id": "b0ad7b71-0efe-4c10-8abc-8b78c8dbd6b1", "metadata": {}, "source": [ "Let's say we want to log to two separate locations on our local filesystem. This example is a bit contrived,\n", "but you could imagine writing to both a local filesystem for quick, ad-hoc exploration and an S3 bucket for\n", "persistent storage." ] }, { "cell_type": "code", "execution_count": 2, "id": "095655e5", "metadata": {}, "outputs": [], "source": [ "rubicon_composite = Rubicon(composite_config=[\n", " {\"persistence\": \"filesystem\", \"root_dir\": \"./rubicon-root/root_a\"},\n", " {\"persistence\": \"filesystem\", \"root_dir\": \"./rubicon-root/root_b\"},\n", "])" ] }, { "cell_type": "markdown", "id": "66644d33", "metadata": {}, "source": [ "### Writing\n", "\n", "All of rubicon-ml's logging functions will now log to both locations in the filesystem with a single function call." ] }, { "cell_type": "code", "execution_count": 3, "id": "b7ecf19d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'8abfbff9-a9a1-46de-b782-3bb4ad1c41a0'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "project_composite = rubicon_composite.create_project(name=\"multiple backends\")\n", "experiment_composite = project_composite.log_experiment()\n", "\n", "feature = experiment_composite.log_feature(name=\"year\")\n", "metric = experiment_composite.log_metric(name=\"accuracy\", value=1.0)\n", "parameter = experiment_composite.log_parameter(name=\"n_estimators\", value=100)\n", "artifact = experiment_composite.log_artifact(\n", " data_bytes=b\"bytes\", name=\"example artifact\"\n", ")\n", "dataframe = experiment_composite.log_dataframe(\n", " pd.DataFrame([[5, 0, 0], [0, 5, 1], [0, 0, 4]], columns=[\"x\", \"y\", \"z\"]),\n", " name=\"example dataframe\",\n", ")\n", "\n", "experiment_composite.id" ] }, { "cell_type": "markdown", "id": "10db7e8b", "metadata": {}, "source": [ "Let's verify both of our backends have been written to by retrieving the data one location at a time." ] }, { "cell_type": "code", "execution_count": 4, "id": "c9e815cf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'8abfbff9-a9a1-46de-b782-3bb4ad1c41a0'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rubicon_a = Rubicon(persistence=\"filesystem\", root_dir=\"./rubicon-root/root_a\")\n", "project_a = rubicon_a.get_project(name=\"multiple backends\")\n", "\n", "project_a.experiments()[0].id" ] }, { "cell_type": "markdown", "id": "baf58168-49ca-4659-b2c8-2315853cbad9", "metadata": {}, "source": [ "Each experiments' IDs match, confirming they are the same." ] }, { "cell_type": "code", "execution_count": 5, "id": "d95347c9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'8abfbff9-a9a1-46de-b782-3bb4ad1c41a0'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rubicon_b = Rubicon(persistence=\"filesystem\", root_dir=\"./rubicon-root/root_b\")\n", "project_b = rubicon_a.get_project(name=\"multiple backends\")\n", "\n", "project_b.experiments()[0].id" ] }, { "cell_type": "markdown", "id": "12a5c1df", "metadata": {}, "source": [ "### Reading\n", "\n", "rubicon-ml's reading functions will iterate over all backend repositories and return from the first one they are able to read from. A `RubiconException` will be raised if none of the backend repositories can be read the requested item(s)." ] }, { "cell_type": "code", "execution_count": 6, "id": "d66157e0-77f5-47d7-994d-09598d878e24", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "project_read = rubicon_composite.get_project(name=\"multiple backends\")\n", "project_read" ] }, { "cell_type": "code", "execution_count": 7, "id": "1f9b622e-84d0-465f-9110-03a3c0289e74", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "features: ['year']\n", "metrics: ['accuracy']\n", "parameters: ['n_estimators']\n", "artifact data: b'bytes'\n", "dataframe data:\n", " x y z\n", "0 5 0 0\n", "1 0 5 1\n", "2 0 0 4\n" ] } ], "source": [ "for experiment in project_read.experiments():\n", " print(f\"features: {[f.name for f in experiment.features()]}\")\n", " print(f\"metrics: {[m.name for m in experiment.metrics()]}\")\n", " print(f\"parameters: {[p.name for p in experiment.parameters()]}\")\n", " print(f\"artifact data: {experiment.artifact(name='example artifact').get_data()}\")\n", " print(f\"dataframe data:\\n{experiment.dataframe(name='example dataframe').get_data()}\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.1" } }, "nbformat": 4, "nbformat_minor": 5 }