{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ffc55deb-dc28-4d36-be12-c375ca24e5d3",
   "metadata": {},
   "source": [
    "# Logging Experiments\n",
    "\n",
    "``rubicon_ml``'s core functionality is centered around logging **experiments** to explain and explore various\n",
    "model runs throughout the model development lifecycle. This example will take a quick look at how we can log\n",
    "model metadata to ``rubicon_ml`` in the context of a simple classification project.\n",
    "\n",
    "We'll leverage the ``palmerpenguins`` dataset collected by Dr. Kristen Gorman as our training/testing data. More\n",
    "information on the dataset can be [found here](https://allisonhorst.github.io/palmerpenguins/).\n",
    "\n",
    "Our goal is to create a simple classification model to differentiate the species of penguins present in the\n",
    "dataset. We'll leverage ``rubicon_ml`` logging to make it easy to compare runs of our model as well as preserve\n",
    "important information for reproducibility later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "934210f6-3701-47bb-9223-bd18171ea761",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: palmerpenguins in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (0.1.4)\n",
      "Requirement already satisfied: numpy in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from palmerpenguins) (1.21.6)\n",
      "Requirement already satisfied: pandas in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from palmerpenguins) (1.4.2)\n",
      "Requirement already satisfied: python-dateutil>=2.8.1 in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from pandas->palmerpenguins) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from pandas->palmerpenguins) (2022.1)\n",
      "Requirement already satisfied: six>=1.5 in /Users/nvd215/opt/miniconda3/envs/rubicon-ml/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas->palmerpenguins) (1.16.0)\n"
     ]
    }
   ],
   "source": [
    "! pip install palmerpenguins"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9d17efc-9147-4e90-a2c7-c83f8a0d9a22",
   "metadata": {},
   "source": [
    "First, we'll load the dataset and perform some basic data preparation. In many scenarios, this will likely be\n",
    "done before loading training/testing data and before experimentation begins."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9d58148e-69af-4664-8cf5-328a884943ba",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "target classes (species): ['Adelie' 'Gentoo' 'Chinstrap']\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>male</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g     sex  year  \n",
       "0       3750.0    male  2007  \n",
       "1       3800.0  female  2007  \n",
       "2       3250.0  female  2007  \n",
       "3          NaN     NaN  2007  \n",
       "4       3450.0  female  2007  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from palmerpenguins import load_penguins\n",
    "\n",
    "penguins_df = load_penguins()\n",
    "target_values = penguins_df['species'].unique()\n",
    "\n",
    "print(f\"target classes (species): {target_values}\")\n",
    "penguins_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9e15f1a-a20b-4360-af4f-82b43a830285",
   "metadata": {},
   "source": [
    "Let's encode the string variables in our dataset to categoricals so our KNN can work with the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0e515256-72ae-4a27-9f5f-03a5313d6b61",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "target classes (species): [0 2 1]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>0</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>0</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>0</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0        0       2            39.1           18.7              181.0   \n",
       "1        0       2            39.5           17.4              186.0   \n",
       "2        0       2            40.3           18.0              195.0   \n",
       "3        0       2             NaN            NaN                NaN   \n",
       "4        0       2            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g  sex  year  \n",
       "0       3750.0    1  2007  \n",
       "1       3800.0    0  2007  \n",
       "2       3250.0    0  2007  \n",
       "3          NaN    2  2007  \n",
       "4       3450.0    0  2007  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.preprocessing import LabelEncoder\n",
    "\n",
    "for column in [\"species\", \"island\", \"sex\"]:\n",
    "    penguins_df[column] = LabelEncoder().fit_transform(penguins_df[column])\n",
    "\n",
    "print(f\"target classes (species): {penguins_df['species'].unique()}\")\n",
    "penguins_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82a8e959-ed52-4b33-9996-708b8eeb0876",
   "metadata": {},
   "source": [
    "Finally, we'll split the preprocessed data into a train and test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e6ab60bf-29db-4cb3-89ec-fba1f6f7c23a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((240, 7), (240,), (104, 7), (104,))"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "train_penguins_df, test_penguins_df = train_test_split(penguins_df, test_size=.30)\n",
    "\n",
    "target_name = \"species\"\n",
    "feature_names = [c for c in train_penguins_df.columns if c != target_name]\n",
    "\n",
    "X_train, y_train = train_penguins_df[feature_names], train_penguins_df[target_name]\n",
    "X_test, y_test = test_penguins_df[feature_names], test_penguins_df[target_name]\n",
    "\n",
    "X_train.shape, y_train.shape, X_test.shape, y_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e80d22a6-a21b-4438-83a7-0746239d292d",
   "metadata": {},
   "source": [
    "Now we can create and train a simple Scikit-learn pipeline to organize our model training code. We'll use a `SimpleImputer`\n",
    "to fill in missing values followed by a `KNeighborsClassifier` to classify the penguins."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d7bb797b-1757-4319-876c-41cd2156237a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7307692307692307"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.impute import SimpleImputer\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn.pipeline import Pipeline\n",
    "\n",
    "imputer_strategy = \"mean\"\n",
    "classifier_n_neighbors = 5\n",
    "\n",
    "steps = [\n",
    "    (\"si\", SimpleImputer(strategy=imputer_strategy)),\n",
    "    (\"kn\", KNeighborsClassifier(n_neighbors=classifier_n_neighbors)),\n",
    "]\n",
    "\n",
    "penguin_pipeline = Pipeline(steps=steps)\n",
    "penguin_pipeline.fit(X_train, y_train)\n",
    "\n",
    "score = penguin_pipeline.score(X_test, y_test)\n",
    "score"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc131e02-8f65-4d6c-9148-27f58d6469ff",
   "metadata": {},
   "source": [
    "We've completed a training run, so let's finally log our results to ``rubicon_ml`` ! We'll create an entrypoint to the\n",
    "local filesystem and create a project called \"classifying penguins\" to store our results. ``rubicon_ml``'s ``log_*``\n",
    "methods can be placed throughout your model code to log any important information along the way. Entities available\n",
    "for logging via the ``log_*`` methods can be found in [our glossary](https://capitalone.github.io/rubicon-ml/glossary.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0d234a1d-d143-4ad8-80ce-512fbc8327f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from rubicon_ml import Rubicon\n",
    "\n",
    "rubicon = Rubicon(\n",
    "    persistence=\"filesystem\",\n",
    "    root_dir=\"./rubicon-root\",\n",
    "    auto_git_enabled=True,\n",
    ")\n",
    "project = rubicon.get_or_create_project(name=\"classifying penguins\")\n",
    "experiment = project.log_experiment()\n",
    "\n",
    "for feature_name in feature_names:\n",
    "    experiment.log_feature(name=feature_name)\n",
    "\n",
    "_ = experiment.log_parameter(name=\"strategy\", value=imputer_strategy)\n",
    "_ = experiment.log_parameter(name=\"n_neighbors\", value=classifier_n_neighbors)\n",
    "_ = experiment.log_metric(name=\"accuracy\", value=score)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68a5598c-93d5-4bf4-898a-033c9e97aad9",
   "metadata": {},
   "source": [
    "After logging, we can inspect the various attributes of our logged entities. All available attributes can be found in \n",
    "[our API reference](https://capitalone.github.io/rubicon-ml/api_reference.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "9d9523e5-69ef-4af3-9488-bf73bfb65073",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Experiment(project_name='classifying penguins', id='c484caf8-bdc1-429f-b012-7a4e02dbc83a', name=None, description=None, model_name=None, branch_name='210-new-quick-look', commit_hash='490e8af895f2cd0636c72295c2762b21cd6c8102', training_metadata=None, tags=[], created_at=datetime.datetime(2022, 6, 30, 13, 51, 4, 958916))\n",
      "\n",
      "git info:\n",
      "\tbranch name: 210-new-quick-look\n",
      "\tcommit hash: 490e8af895f2cd0636c72295c2762b21cd6c8102\n",
      "features: ['island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex', 'year']\n",
      "parameters: [('strategy', 'mean'), ('n_neighbors', 5)]\n",
      "metrics: [('accuracy', 0.7307692307692307)]\n"
     ]
    }
   ],
   "source": [
    "print(experiment)\n",
    "print()\n",
    "print(f\"git info:\")\n",
    "print(f\"\\tbranch name: {experiment.branch_name}\\n\\tcommit hash: {experiment.commit_hash}\")\n",
    "print(f\"features: {[f.name for f in experiment.features()]}\")\n",
    "print(f\"parameters: {[(p.name, p.value) for p in experiment.parameters()]}\")\n",
    "print(f\"metrics: {[(m.name, m.value) for m in experiment.metrics()]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54d8e9e4-7f6c-400a-ac9c-39424864c3c2",
   "metadata": {},
   "source": [
    "Tracking the results of a single model fit is nice, but ``rubicon_ml`` can really shine when we're iterating over numerous\n",
    "model fits - like a hyperparameter search. The code below performs a very basic hyperparameter search for a ``strategy``\n",
    "for the ``SimpleImputer`` and an ``n_neighbors`` for the ``KNeighborsClassifier`` while logging the results of each model\n",
    "fit to a new ``rubicon_ml`` experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9ebe5fa9-3ff7-4ab9-8213-bfd1199e0520",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.base import clone\n",
    "\n",
    "for imputer_strategy in [\"mean\", \"median\", \"most_frequent\"]:\n",
    "    for classifier_n_neighbors in [5, 10, 15, 20]:\n",
    "        pipeline = clone(penguin_pipeline)\n",
    "        pipeline.set_params(\n",
    "            si__strategy=imputer_strategy,\n",
    "            kn__n_neighbors=classifier_n_neighbors,\n",
    "        )\n",
    "        \n",
    "        pipeline.fit(X_train, y_train)\n",
    "        score = pipeline.score(X_test, y_test)\n",
    "\n",
    "        experiment = project.log_experiment(tags=[\"parameter search\"])\n",
    "\n",
    "        for feature_name in feature_names:\n",
    "            experiment.log_feature(name=feature_name)\n",
    "        experiment.log_parameter(name=\"strategy\", value=imputer_strategy)\n",
    "        experiment.log_parameter(name=\"n_neighbors\", value=classifier_n_neighbors)\n",
    "        experiment.log_metric(name=\"accuracy\", value=score)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "312705c8-e7c3-48b9-98b1-34b6f5c5b71c",
   "metadata": {},
   "source": [
    "Now we can take a look at a few experiments and compare our results. Notice that we're still pulling experiments from the same\n",
    "project that we logged the first one to. However, we're only retrieving the experiments from the search above by using the\n",
    "\"parameter search\" tag when we get our experiments. Each experiment in the hyperparameter search above was tagged with\n",
    "\"parameter search\" when it was logged."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "79e60347-e4f0-4e17-bf77-846277aa7f50",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "experiments:\n",
      "\tid: a75b1258-2276-4eb1-beb5-caf83e9aacf3, parameters: [('strategy', 'mean'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7307692307692307)]\n",
      "\tid: 02a89318-b8d9-49a5-9337-7e4368cc54da, parameters: [('strategy', 'mean'), ('n_neighbors', 10)], metrics: [('accuracy', 0.75)]\n",
      "\tid: ce24eeef-4686-4fc7-8c0a-e73d6c9cdb71, parameters: [('strategy', 'mean'), ('n_neighbors', 15)], metrics: [('accuracy', 0.7596153846153846)]\n",
      "\tid: 093a9d02-89f7-4e48-82b1-f9ade435ef03, parameters: [('strategy', 'mean'), ('n_neighbors', 20)], metrics: [('accuracy', 0.7211538461538461)]\n",
      "\tid: bc4d0503-32d1-4a11-8222-4151dae893cf, parameters: [('strategy', 'median'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7211538461538461)]\n",
      "\tid: c1b6cb3a-0ad1-4932-914d-ba53a054891b, parameters: [('strategy', 'median'), ('n_neighbors', 10)], metrics: [('accuracy', 0.7403846153846154)]\n",
      "\tid: 9d6ffe67-088d-483f-9d3f-8f0fb34c22e8, parameters: [('strategy', 'median'), ('n_neighbors', 15)], metrics: [('accuracy', 0.7596153846153846)]\n",
      "\tid: f497245a-6149-4604-9ceb-da74ae9855d4, parameters: [('strategy', 'median'), ('n_neighbors', 20)], metrics: [('accuracy', 0.7211538461538461)]\n",
      "\tid: b2cd8067-ad4c-4ed5-87f7-2cd4536b2c73, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 5)], metrics: [('accuracy', 0.7211538461538461)]\n",
      "\tid: c4277327-381a-4885-aba4-a07c050463a5, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 10)], metrics: [('accuracy', 0.75)]\n",
      "\tid: d4ea2fe7-061e-4f5e-8958-e6ac29025708, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 15)], metrics: [('accuracy', 0.7596153846153846)]\n",
      "\tid: d9fe2005-824c-4e23-9809-e0459e57d78a, parameters: [('strategy', 'most_frequent'), ('n_neighbors', 20)], metrics: [('accuracy', 0.7211538461538461)]\n"
     ]
    }
   ],
   "source": [
    "print(\"experiments:\")\n",
    "for experiment in project.experiments(tags=[\"parameter search\"]):\n",
    "    print(\n",
    "        f\"\\tid: {experiment.id}, \"\n",
    "        f\"parameters: {[(p.name, p.value) for p in experiment.parameters()]}, \"\n",
    "        f\"metrics: {[(m.name, m.value) for m in experiment.metrics()]}\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fd08a67-3bf5-48f3-a40b-18324f7da1f6",
   "metadata": {},
   "source": [
    "``rubicon_ml`` can log more complex data as well. Below we'll log our trained model as an artifact (generic binary) and a\n",
    "confusion matrix explaining the results as a dataframe (accepts both ``pandas`` and ``dask`` dataframes natively)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2240e03d-b8a6-4c40-a570-512873ff7277",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "KNeighborsClassifier(n_neighbors=20)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Adelie</th>\n",
       "      <th>Gentoo</th>\n",
       "      <th>Chinstrap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Adelie</th>\n",
       "      <td>37</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Gentoo</th>\n",
       "      <td>19</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chinstrap</th>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Adelie  Gentoo  Chinstrap\n",
       "Adelie         37       0          3\n",
       "Gentoo         19       0          1\n",
       "Chinstrap       6       0         38"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from sklearn.metrics import confusion_matrix\n",
    "\n",
    "experiment = project.experiments(tags=[\"parameter search\"])[-1]\n",
    "\n",
    "trained_model = pipeline._final_estimator\n",
    "experiment.log_artifact(data_object=trained_model, name=\"trained model\")\n",
    "\n",
    "y_pred = pipeline.predict(X_test)\n",
    "confusion_matrix_df = pd.DataFrame(\n",
    "    confusion_matrix(y_test, y_pred),\n",
    "    columns=target_values,\n",
    "    index=target_values,\n",
    ")\n",
    "experiment.log_dataframe(confusion_matrix_df, name=\"confusion matrix\")\n",
    "\n",
    "print(experiment.artifact(name=\"trained model\").get_data(unpickle=True))\n",
    "experiment.dataframe(name=\"confusion matrix\").get_data()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}