{ "cells": [ { "cell_type": "markdown", "id": "a8ec0623-c2b2-481c-b1d0-a605d19d0062", "metadata": {}, "source": [ "# Dataframe Plot\n", "\n", "The dataframe plot is used to visualize data logged across multiple experiments as\n", "dataframes on a single plot. The dataframes logged to each experiment must have\n", "matching schema in order to properly visualize them." ] }, { "cell_type": "code", "execution_count": 1, "id": "712796d5-81d6-4382-8844-b523d5812c9d", "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import plotly.express as px\n", "\n", "from rubicon_ml import Rubicon\n", "from rubicon_ml.viz import DataframePlot" ] }, { "cell_type": "markdown", "id": "1b75b715-279f-4d37-931f-2197886f1c6c", "metadata": {}, "source": [ "First, we'll create a few experiments and log a dataframe to each one." ] }, { "cell_type": "code", "execution_count": 2, "id": "59ef9397-23ae-4aec-a0a7-9532ea07d41a", "metadata": {}, "outputs": [], "source": [ "DISPLAY_DFS = False\n", "\n", "rubicon = Rubicon(persistence=\"memory\", auto_git_enabled=True)\n", "project = rubicon.get_or_create_project(\"plot comparison\")\n", "\n", "num_experiments_to_log = 6\n", "data_ranges = [\n", " (random.randint(0, 15000), random.randint(0, 15000))\n", " for _ in range(num_experiments_to_log)\n", "]\n", "dates = pd.date_range(start=\"1/1/2010\", end=\"12/1/2020\", freq=\"MS\")\n", "\n", "for start, stop in data_ranges:\n", " data = np.array([list(dates), np.linspace(start, stop, len(dates))])\n", " data_df = pd.DataFrame.from_records(\n", " data.T,\n", " columns=[\"calendar month\", \"open accounts\"],\n", " )\n", "\n", " dataframe = project.log_experiment().log_dataframe(data_df, name=\"open accounts\")\n", " \n", " if DISPLAY_DFS:\n", " print(f\"dataframe {dataframe.id}\")\n", " display(data_df.head())" ] }, { "cell_type": "markdown", "id": "8a067df5-2706-4038-b3f3-57fdfb9981af", "metadata": {}, "source": [ "Now, we can instantiate the `DataframePlot` object with the experiments we just logged. We\n", "also need to provide the name of the dataframe we're plotting. Optionally, provide a Plotly\n", "express plotting function as `plotting_func` to visualize the dataframes with any of Plotly\n", "express' available options.\n", "\n", "More on the available Plotly express visualizations can be found in [the Plotly express\n", "documentation](https://plotly.com/python/plotly-express/).\n", "\n", "We can view the plot right in the notebook with `show`. The Dash application\n", "itself will be running on http://127.0.0.1:8050/ when running locally. Use the\n", "`serve` command to launch the server directly without rendering the widget in the\n", "current Python interpreter." ] }, { "cell_type": "code", "execution_count": 3, "id": "0790c5b6-7398-4bf4-a923-8252cc8cd033", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dash is running on http://127.0.0.1:8050/\n" ] } ], "source": [ "DataframePlot(\n", " experiments=project.experiments(),\n", " dataframe_name=\"open accounts\",\n", " x=\"calendar month\",\n", " y=\"open accounts\",\n", " plotting_func=px.line,\n", ").show()" ] }, { "cell_type": "markdown", "id": "65a143a5-24d3-44c1-b8b7-85e29b6d515d", "metadata": {}, "source": [ "![dataframe-plot](dataframe-plot.png \"dataframe plot\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }