{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Logging Concurrently\n", " \n", "Let's see how a few other popular ``scikit-learn`` models perform with the\n", "wine dataset. ``rubicon_ml``'s filesystem logging is totally thread-safe,\n", "so we can test a lot of model configurations at once.\n", "\n", "**Note**: ``rubicon_ml``'s in-memory logging is inherently not threadsafe as\n", "each thread has its own memory." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "\n", "from rubicon_ml import Rubicon\n", "\n", "\n", "root_dir = os.environ.get(\"RUBICON_ROOT\", \"rubicon-root\")\n", "root_path = f\"{os.path.dirname(os.getcwd())}/{root_dir}\"\n", "\n", "rubicon = Rubicon(persistence=\"filesystem\", root_dir=root_path)\n", "project = rubicon.get_or_create_project(\n", " \"Concurrent Experiments\",\n", " description=\"training multiple models in parallel\",\n", ")\n", "\n", "project" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a recap of the contents of the wine dataset, check out ``wine.DESCR``\n", "and ``wine.data``. We'll put together a training dataset using a subset of the data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_wine\n", "from sklearn.model_selection import train_test_split\n", "\n", "\n", "wine = load_wine()\n", "wine_feature_names = wine.feature_names\n", "wine_datasets = train_test_split(\n", " wine[\"data\"],\n", " wine[\"target\"],\n", " test_size=0.25,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use this ``run_experiment`` function to log a new **experiment** to\n", "the provided **project** then train, run and log a model of type ``classifier_cls`` using\n", "the training and testing data in ``wine_datasets``." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from collections import namedtuple\n", "\n", "\n", "SklearnTrainingMetadata = namedtuple(\"SklearnTrainingMetadata\", \"module_name method\")\n", "\n", "def run_experiment(project, classifier_cls, wine_datasets, feature_names, **kwargs):\n", " X_train, X_test, y_train, y_test = wine_datasets\n", " \n", " experiment = project.log_experiment(\n", " training_metadata=[\n", " SklearnTrainingMetadata(\"sklearn.datasets\", \"load_wine\"),\n", " ],\n", " model_name=classifier_cls.__name__,\n", " tags=[classifier_cls.__name__],\n", " )\n", " \n", " for key, value in kwargs.items():\n", " experiment.log_parameter(key, value)\n", " \n", " for name in feature_names:\n", " experiment.log_feature(name)\n", " \n", " classifier = classifier_cls(**kwargs)\n", " classifier.fit(X_train, y_train)\n", " classifier.predict(X_test)\n", " \n", " accuracy = classifier.score(X_test, y_test)\n", " \n", " experiment.log_metric(\"accuracy\", accuracy)\n", "\n", " if accuracy >= .95:\n", " experiment.add_tags([\"success\"])\n", " else:\n", " experiment.add_tags([\"failure\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you're familiar with trying to use the ``multiprocessing`` library in iPython 3, you'll\n", "know the two don't play along well. In order for the ``multiprocessing`` library to locate\n", "the ``run_experiment`` function, we'll need to import it from a separate file.\n", "``./logging_concurrently.py`` defines the same ``run_experiment`` function as above." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from logging_concurrently import run_experiment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time we'll take a look at three classifiers - ``RandomForestClassifier``, ``DecisionTreeClassifier``, and\n", "``KNeighborsClassifier`` - to see which performs best. Each classifier will be run across four sets of parameters\n", "(provided as ``kwargs`` to ``run_experiment``), for a total of 12 experiments. Here, we'll build up a list of\n", "processes that will run each experiment in parallel." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import multiprocessing\n", "\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.tree import DecisionTreeClassifier\n", "\n", "\n", "processes = []\n", "\n", "for n_estimators in [10, 20, 30, 40]:\n", " processes.append(multiprocessing.Process(\n", " target=run_experiment,\n", " args=[project, RandomForestClassifier, wine_datasets, wine_feature_names],\n", " kwargs={\"n_estimators\": n_estimators},\n", " )) \n", "\n", "for n_neighbors in [5, 10, 15, 20]:\n", " processes.append(multiprocessing.Process(\n", " target=run_experiment,\n", " args=[project, KNeighborsClassifier, wine_datasets, wine_feature_names],\n", " kwargs={\"n_neighbors\": n_neighbors},\n", " ))\n", "\n", "for criterion in [\"gini\", \"entropy\"]:\n", " for splitter in [\"best\", \"random\"]:\n", " processes.append(multiprocessing.Process(\n", " target=run_experiment,\n", " args=[project, DecisionTreeClassifier, wine_datasets, wine_feature_names],\n", " kwargs={\n", " \"criterion\": criterion,\n", " \"splitter\": splitter,\n", " },\n", " ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run all our experiments in parallel!" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "for process in processes:\n", " process.start()\n", "\n", "for process in processes:\n", " process.join()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can validate that we successfully logged all 12 experiments to our project." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(project.experiments())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see which experiments we tagged as successful and what type of model they used." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "experiment 3a144831 was successful using a RandomForestClassifier\n", "experiment 67961ac9 was successful using a RandomForestClassifier\n", "experiment 712564c0 was successful using a RandomForestClassifier\n", "experiment f143ea43 was successful using a RandomForestClassifier\n" ] } ], "source": [ "for e in project.experiments(tags=[\"success\"]): \n", " print(f\"experiment {e.id[:8]} was successful using a {e.model_name}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also take a deeper look at any of our experiments." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "experiment 0b6b8114-fe72-49af-813b-23e1667c1486\n", "training metadata: SklearnTrainingMetadata(module_name='sklearn.datasets', method='load_wine')\n", "tags: ['KNeighborsClassifier', 'failure']\n", "parameters: [('n_neighbors', 10)]\n", "metrics: [('accuracy', 0.7111111111111111)]\n" ] } ], "source": [ "first_experiment = project.experiments()[0]\n", "\n", "training_metadata = SklearnTrainingMetadata(*first_experiment.training_metadata)\n", "tags = first_experiment.tags\n", "\n", "parameters = [(p.name, p.value) for p in first_experiment.parameters()]\n", "metrics = [(m.name, m.value) for m in first_experiment.metrics()]\n", " \n", "print(\n", " f\"experiment {first_experiment.id}\\n\"\n", " f\"training metadata: {training_metadata}\\ntags: {tags}\\n\"\n", " f\"parameters: {parameters}\\nmetrics: {metrics}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we could grab the project's data as a dataframe!" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamedescriptionmodel_namecommit_hashtagscreated_atsplittercriterionn_estimatorsn_neighborsaccuracy
0aa4faa9b-3051-454d-915d-54b5c16f6e34NoneNoneDecisionTreeClassifierNone[DecisionTreeClassifier, failure]2021-04-28 12:45:27.759817bestginiNaNNaN0.933333
17aec1e7b-c7dd-43e7-b20c-a400b2141d23NoneNoneDecisionTreeClassifierNone[DecisionTreeClassifier, failure]2021-04-28 12:45:27.752157bestentropyNaNNaN0.933333
23a144831-b093-4dc3-85a6-71bc7a4e26a4NoneNoneRandomForestClassifierNone[RandomForestClassifier, success]2021-04-28 12:45:27.715676NaNNaN20.0NaN1.000000
37503bcf0-2daa-4c66-9b45-a75863d446a5NoneNoneDecisionTreeClassifierNone[DecisionTreeClassifier, failure]2021-04-28 12:45:27.714952randomginiNaNNaN0.911111
4f143ea43-d780-4639-a082-5c81b850f1e0NoneNoneRandomForestClassifierNone[RandomForestClassifier, success]2021-04-28 12:45:27.636078NaNNaN10.0NaN0.977778
567961ac9-4bd5-4916-925c-fd38844d78c6NoneNoneRandomForestClassifierNone[RandomForestClassifier, success]2021-04-28 12:45:27.635968NaNNaN30.0NaN0.977778
68b6b9b9a-1e06-4e0a-af2e-9649d79872e3NoneNoneKNeighborsClassifierNone[KNeighborsClassifier, failure]2021-04-28 12:45:27.619764NaNNaNNaN15.00.733333
73c679288-1402-4ffb-bbb1-23fcfd0b1415NoneNoneDecisionTreeClassifierNone[DecisionTreeClassifier, failure]2021-04-28 12:45:27.615204randomentropyNaNNaN0.888889
838aa6e7b-efcc-4ee5-9950-5449ef8681ffNoneNoneKNeighborsClassifierNone[KNeighborsClassifier, failure]2021-04-28 12:45:27.585590NaNNaNNaN20.00.711111
90b6b8114-fe72-49af-813b-23e1667c1486NoneNoneKNeighborsClassifierNone[KNeighborsClassifier, failure]2021-04-28 12:45:27.583647NaNNaNNaN10.00.711111
10a0b7e295-6047-45da-81e6-08dc8288a2ebNoneNoneKNeighborsClassifierNone[KNeighborsClassifier, failure]2021-04-28 12:45:27.564990NaNNaNNaN5.00.733333
11712564c0-e113-42ea-b59c-90a5ce0cf37bNoneNoneRandomForestClassifierNone[RandomForestClassifier, success]2021-04-28 12:45:27.511807NaNNaN40.0NaN0.955556
\n", "
" ], "text/plain": [ " id name description \\\n", "0 aa4faa9b-3051-454d-915d-54b5c16f6e34 None None \n", "1 7aec1e7b-c7dd-43e7-b20c-a400b2141d23 None None \n", "2 3a144831-b093-4dc3-85a6-71bc7a4e26a4 None None \n", "3 7503bcf0-2daa-4c66-9b45-a75863d446a5 None None \n", "4 f143ea43-d780-4639-a082-5c81b850f1e0 None None \n", "5 67961ac9-4bd5-4916-925c-fd38844d78c6 None None \n", "6 8b6b9b9a-1e06-4e0a-af2e-9649d79872e3 None None \n", "7 3c679288-1402-4ffb-bbb1-23fcfd0b1415 None None \n", "8 38aa6e7b-efcc-4ee5-9950-5449ef8681ff None None \n", "9 0b6b8114-fe72-49af-813b-23e1667c1486 None None \n", "10 a0b7e295-6047-45da-81e6-08dc8288a2eb None None \n", "11 712564c0-e113-42ea-b59c-90a5ce0cf37b None None \n", "\n", " model_name commit_hash tags \\\n", "0 DecisionTreeClassifier None [DecisionTreeClassifier, failure] \n", "1 DecisionTreeClassifier None [DecisionTreeClassifier, failure] \n", "2 RandomForestClassifier None [RandomForestClassifier, success] \n", "3 DecisionTreeClassifier None [DecisionTreeClassifier, failure] \n", "4 RandomForestClassifier None [RandomForestClassifier, success] \n", "5 RandomForestClassifier None [RandomForestClassifier, success] \n", "6 KNeighborsClassifier None [KNeighborsClassifier, failure] \n", "7 DecisionTreeClassifier None [DecisionTreeClassifier, failure] \n", "8 KNeighborsClassifier None [KNeighborsClassifier, failure] \n", "9 KNeighborsClassifier None [KNeighborsClassifier, failure] \n", "10 KNeighborsClassifier None [KNeighborsClassifier, failure] \n", "11 RandomForestClassifier None [RandomForestClassifier, success] \n", "\n", " created_at splitter criterion n_estimators n_neighbors \\\n", "0 2021-04-28 12:45:27.759817 best gini NaN NaN \n", "1 2021-04-28 12:45:27.752157 best entropy NaN NaN \n", "2 2021-04-28 12:45:27.715676 NaN NaN 20.0 NaN \n", "3 2021-04-28 12:45:27.714952 random gini NaN NaN \n", "4 2021-04-28 12:45:27.636078 NaN NaN 10.0 NaN \n", "5 2021-04-28 12:45:27.635968 NaN NaN 30.0 NaN \n", "6 2021-04-28 12:45:27.619764 NaN NaN NaN 15.0 \n", "7 2021-04-28 12:45:27.615204 random entropy NaN NaN \n", "8 2021-04-28 12:45:27.585590 NaN NaN NaN 20.0 \n", "9 2021-04-28 12:45:27.583647 NaN NaN NaN 10.0 \n", "10 2021-04-28 12:45:27.564990 NaN NaN NaN 5.0 \n", "11 2021-04-28 12:45:27.511807 NaN NaN 40.0 NaN \n", "\n", " accuracy \n", "0 0.933333 \n", "1 0.933333 \n", "2 1.000000 \n", "3 0.911111 \n", "4 0.977778 \n", "5 0.977778 \n", "6 0.733333 \n", "7 0.888889 \n", "8 0.711111 \n", "9 0.711111 \n", "10 0.733333 \n", "11 0.955556 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddf = rubicon.get_project_as_df(\"Concurrent Experiments\", df_type=\"dask\")\n", "ddf.compute()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }