View this notebook on GitHub or run it yourself on Binder!
Ignoring Exceptions with Failure Modes¶
rubicon-ml
is often used for logging in scenarios that require high availability, like model inference pipelines running in production environments. If something were to go wrong with rubicon-ml
during live model inference, we could end up halting predictions just for a logging issue. rubicon-ml
’s configurable failure modes allow users to choose what to do with rubicon-ml
exceptions!
First, let’s try to get a project that we haven’t yet created. This will show the default failure behavior - raising a RubiconException
that halts execution of the code it originated from.
[1]:
from rubicon_ml import Rubicon
rb = Rubicon(persistence="memory")
rb.get_project(name="failure modes")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py:213, in MemoryFileSystem.cat_file(self, path, start, end, **kwargs)
212 try:
--> 213 return bytes(self.store[path].getbuffer()[start:end])
214 except KeyError:
KeyError: '/root/failure-modes/metadata.json'
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
File ~/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py:102, in BaseRepository.get_project(self, project_name)
101 try:
--> 102 project = json.loads(self.filesystem.cat(project_metadata_path))
103 except FileNotFoundError:
File ~/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/spec.py:755, in AbstractFileSystem.cat(self, path, recursive, on_error, **kwargs)
754 else:
--> 755 return self.cat_file(paths[0], **kwargs)
File ~/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py:215, in MemoryFileSystem.cat_file(self, path, start, end, **kwargs)
214 except KeyError:
--> 215 raise FileNotFoundError(path)
FileNotFoundError: /root/failure-modes/metadata.json
During handling of the above exception, another exception occurred:
RubiconException Traceback (most recent call last)
Cell In [1], line 5
1 from rubicon_ml import Rubicon
4 rb = Rubicon(persistence="memory")
----> 5 rb.get_project(name="failure modes")
File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:46, in failsafe.<locals>.wrapper(*args, **kwargs)
44 except Exception as e:
45 if FAILURE_MODE == "raise":
---> 46 raise e
47 elif FAILURE_MODE == "warn":
48 warnings.warn(traceback.format_exc(limit=TRACEBACK_LIMIT, chain=TRACEBACK_CHAIN))
File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:43, in failsafe.<locals>.wrapper(*args, **kwargs)
41 def wrapper(*args, **kwargs):
42 try:
---> 43 return func(*args, **kwargs)
44 except Exception as e:
45 if FAILURE_MODE == "raise":
File ~/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py:123, in Rubicon.get_project(self, name, id)
120 raise ValueError("`name` OR `id` required.")
122 if name is not None:
--> 123 project = self.repository.get_project(name)
124 project = Project(project, self.config)
125 else:
File ~/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py:104, in BaseRepository.get_project(self, project_name)
102 project = json.loads(self.filesystem.cat(project_metadata_path))
103 except FileNotFoundError:
--> 104 raise RubiconException(f"No project with name '{project_name}' found.")
106 return domain.Project(**project)
RubiconException: No project with name 'failure modes' found.
But, let’s say we’re far more concerned with keeping our code running than we are with whether or not our logs get logged.
We can set_failure_mode
to “warn” to instead raise warnings (via the builtin warnings.warn
) whenever rubicon-ml
encounters an exception and continue execution of the offending code.
[2]:
from rubicon_ml import set_failure_mode
set_failure_mode("warn")
rb.get_project(name="failure modes")
/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:48: UserWarning: Traceback (most recent call last):
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
return func(*args, **kwargs)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
project = self.repository.get_project(name)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.
warnings.warn(traceback.format_exc(limit=TRACEBACK_LIMIT, chain=TRACEBACK_CHAIN))
We can also set_failure_mode
to “log” to log the error with the builtin logging.error
.
[3]:
set_failure_mode("log")
rb.get_project(name="failure modes")
ERROR:root:Traceback (most recent call last):
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
return func(*args, **kwargs)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
project = self.repository.get_project(name)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.
set_failure_mode
back to the default - “raise” - to return to raising the exceptions.
Log and warning verbosity¶
The “log” and “warn” failure modes leverage the builtin traceback.exc_info()
in order to print an error’s traceback. set_failure_mode
’s traceback_limit
and traceback_chain
are passed directly through to the underlying call to traceback.exc_info()
as the limit
and chain
arguments.
limit
is an integer between 0 and the depth of the stack trace that controls the verbosity of the trace.
[4]:
set_failure_mode("log", traceback_limit=0)
rb.get_project(name="failure modes")
ERROR:root:rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.
chain
can be set to True
to see the full chain of exceptions rather than just the final exception in the chain.
[5]:
set_failure_mode("log", traceback_chain=True)
rb.get_project(name="failure modes")
ERROR:root:Traceback (most recent call last):
File "/Users/nvd215/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py", line 213, in cat_file
return bytes(self.store[path].getbuffer()[start:end])
KeyError: '/root/failure-modes/metadata.json'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 102, in get_project
project = json.loads(self.filesystem.cat(project_metadata_path))
File "/Users/nvd215/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/spec.py", line 755, in cat
return self.cat_file(paths[0], **kwargs)
File "/Users/nvd215/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py", line 215, in cat_file
raise FileNotFoundError(path)
FileNotFoundError: /root/failure-modes/metadata.json
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
return func(*args, **kwargs)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
project = self.repository.get_project(name)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.
Caution with return values¶
Some workflows require the returned rubicon-ml
objects to be leveraged for future logging in the same process. For example, let’s finally create the “failure modes” project and take a look at the returned rubicon-ml
object.
[6]:
rb.create_project(name="failure modes")
project = rb.get_project(name="failure modes")
print(project)
Project(name='failure modes', id='8d6ac09b-45c5-4097-9728-302c481a1665', description=None, github_url=None, training_metadata=None, created_at=datetime.datetime(2022, 11, 15, 17, 5, 2, 510729))
Now we can take any standard action on the returned rubicon-ml
object, like inspecting its ID.
[7]:
print(project.id)
8d6ac09b-45c5-4097-9728-302c481a1665
If we were to leverage either the “log” or “warn” failure mode, which does not stop execution for rubicon-ml
errors, we need to be cautious of returned rubicon-ml
objects.
Now we’ll try to get another project that doesn’t exist. Even though this code will not stop execution after the failed get_project
call, we need to be aware that a rubicon-ml
project will not be returned in this case. A None
will be returned in its place, thus any action taken on this returned None
may fail if only rubicon-ml
objects are expected downstream.
[8]:
set_failure_mode("log")
project = rb.get_project(name="failure modes v2")
print(project)
ERROR:root:Traceback (most recent call last):
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
return func(*args, **kwargs)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
project = self.repository.get_project(name)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes v2' found.
None
When leveraging failure modes that do not interrupt execution, it is important to check the types of the objects returned from rubicon-ml
. Simply trying to access the id
attribute in this case would result in an AttributeError
as a NoneType
object hs no attribute id
.
[9]:
if project is not None:
print(project.id)
A more practical use case¶
Let’s take a look at how this may work in a more practical machine learning workflow. For this example, we’ll train a k-neighbors classifier from Scikit-learn and attempt to log the input parameters and the score the trained model produces on a test dataset.
First, we’ll create a new project and experiment to attempt to log our inputs and outputs to.
[10]:
project = rb.create_project(name="failure modes v3")
experiment = project.log_experiment()
experiment
[10]:
<rubicon_ml.client.experiment.Experiment at 0x15cacce20>
Now that we’ve got an experiment, lets replace rubicon-ml
’s filesystem
(the part of the library that handles actual filesystem operations) with a no-op class representing a broken filesystem. Imagine this simulating something like losing connection to S3.
We’ll also set the failure mode back to raise for this first execution of our workflow with the broken filesystem.
[11]:
class BrokenFilesystem:
pass
rb.config.repository.filesystem = BrokenFilesystem()
set_failure_mode("raise")
Now, notice that when we run the cell below, we only fit the model before execution is halted due to experiment.log_parameter
raising an exception because of the broken filesystem. We never get a score
, and it is never displayed.
[12]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
X_train, y_train, X_test, y_test = [[0, 1, 2, 3]], [0], [[0, 1, 2, 3]], [0]
knn.fit(X_train, y_train)
experiment.log_parameter(name="n_neighbors", value=1)
score = knn.score(X_test, y_test)
experiment.log_metric(name="score", value=score)
score
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [12], line 8
5 X_train, y_train, X_test, y_test = [[0, 1, 2, 3]], [0], [[0, 1, 2, 3]], [0]
7 knn.fit(X_train, y_train)
----> 8 experiment.log_parameter(name="n_neighbors", value=1)
10 score = knn.score(X_test, y_test)
11 experiment.log_metric(name="score", value=score)
File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:46, in failsafe.<locals>.wrapper(*args, **kwargs)
44 except Exception as e:
45 if FAILURE_MODE == "raise":
---> 46 raise e
47 elif FAILURE_MODE == "warn":
48 warnings.warn(traceback.format_exc(limit=TRACEBACK_LIMIT, chain=TRACEBACK_CHAIN))
File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:43, in failsafe.<locals>.wrapper(*args, **kwargs)
41 def wrapper(*args, **kwargs):
42 try:
---> 43 return func(*args, **kwargs)
44 except Exception as e:
45 if FAILURE_MODE == "raise":
File ~/github/capitalone/rubicon-ml/rubicon_ml/client/experiment.py:237, in Experiment.log_parameter(self, name, value, description, tags)
214 """Create a parameter under the experiment.
215
216 Parameters
(...)
234 The created parameter.
235 """
236 parameter = domain.Parameter(name, value=value, description=description, tags=tags)
--> 237 self.repository.create_parameter(parameter, self.project.name, self.id)
239 return Parameter(parameter, self)
File ~/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py:852, in BaseRepository.create_parameter(self, parameter, project_name, experiment_id)
836 """Persist a parameter to the configured filesystem.
837
838 Parameters
(...)
846 The ID of the experiment this parameter belongs to.
847 """
848 parameter_metadata_path = self._get_parameter_metadata_path(
849 project_name, experiment_id, parameter.name
850 )
--> 852 if self.filesystem.exists(parameter_metadata_path):
853 raise RubiconException(f"A parameter with name '{parameter.name}' already exists.")
855 self._persist_domain(parameter, parameter_metadata_path)
AttributeError: 'BrokenFilesystem' object has no attribute 'exists'
By setting the failure mode to “log” and ensuring we are not using any unchecked objects returned by rubicon-ml
, we can ensure that the entire workflow is completed regardless of whether or not the filesystem is working.
[13]:
set_failure_mode("log")
knn = KNeighborsClassifier(n_neighbors=1)
X_train, y_train, X_test, y_test = [[0, 1, 2, 3]], [0], [[0, 1, 2, 3]], [0]
knn.fit(X_train, y_train)
experiment.log_parameter(name="n_neighbors", value=1)
score = knn.score(X_test, y_test)
experiment.log_metric(name="score", value=score)
score
ERROR:root:Traceback (most recent call last):
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
return func(*args, **kwargs)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/experiment.py", line 237, in log_parameter
self.repository.create_parameter(parameter, self.project.name, self.id)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 852, in create_parameter
if self.filesystem.exists(parameter_metadata_path):
AttributeError: 'BrokenFilesystem' object has no attribute 'exists'
ERROR:root:Traceback (most recent call last):
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
return func(*args, **kwargs)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/experiment.py", line 76, in log_metric
self.repository.create_metric(metric, self.project.name, self.id)
File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 746, in create_metric
if self.filesystem.exists(metric_metadata_path):
AttributeError: 'BrokenFilesystem' object has no attribute 'exists'
[13]:
1.0