View this notebook on GitHub or run it yourself on Binder!


Ignoring Exceptions with Failure Modes#

rubicon-ml is often used for logging in scenarios that require high availability, like model inference pipelines running in production environments. If something were to go wrong with rubicon-ml during live model inference, we could end up halting predictions just for a logging issue. rubicon-ml’s configurable failure modes allow users to choose what to do with rubicon-ml exceptions!

First, let’s try to get a project that we haven’t yet created. This will show the default failure behavior - raising a RubiconException that halts execution of the code it originated from.

[1]:
from rubicon_ml import Rubicon


rb = Rubicon(persistence="memory")
rb.get_project(name="failure modes")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py:213, in MemoryFileSystem.cat_file(self, path, start, end, **kwargs)
    212 try:
--> 213     return bytes(self.store[path].getbuffer()[start:end])
    214 except KeyError:

KeyError: '/root/failure-modes/metadata.json'

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
File ~/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py:102, in BaseRepository.get_project(self, project_name)
    101 try:
--> 102     project = json.loads(self.filesystem.cat(project_metadata_path))
    103 except FileNotFoundError:

File ~/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/spec.py:755, in AbstractFileSystem.cat(self, path, recursive, on_error, **kwargs)
    754 else:
--> 755     return self.cat_file(paths[0], **kwargs)

File ~/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py:215, in MemoryFileSystem.cat_file(self, path, start, end, **kwargs)
    214 except KeyError:
--> 215     raise FileNotFoundError(path)

FileNotFoundError: /root/failure-modes/metadata.json

During handling of the above exception, another exception occurred:

RubiconException                          Traceback (most recent call last)
Cell In [1], line 5
      1 from rubicon_ml import Rubicon
      4 rb = Rubicon(persistence="memory")
----> 5 rb.get_project(name="failure modes")

File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:46, in failsafe.<locals>.wrapper(*args, **kwargs)
     44 except Exception as e:
     45     if FAILURE_MODE == "raise":
---> 46         raise e
     47     elif FAILURE_MODE == "warn":
     48         warnings.warn(traceback.format_exc(limit=TRACEBACK_LIMIT, chain=TRACEBACK_CHAIN))

File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:43, in failsafe.<locals>.wrapper(*args, **kwargs)
     41 def wrapper(*args, **kwargs):
     42     try:
---> 43         return func(*args, **kwargs)
     44     except Exception as e:
     45         if FAILURE_MODE == "raise":

File ~/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py:123, in Rubicon.get_project(self, name, id)
    120     raise ValueError("`name` OR `id` required.")
    122 if name is not None:
--> 123     project = self.repository.get_project(name)
    124     project = Project(project, self.config)
    125 else:

File ~/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py:104, in BaseRepository.get_project(self, project_name)
    102     project = json.loads(self.filesystem.cat(project_metadata_path))
    103 except FileNotFoundError:
--> 104     raise RubiconException(f"No project with name '{project_name}' found.")
    106 return domain.Project(**project)

RubiconException: No project with name 'failure modes' found.

But, let’s say we’re far more concerned with keeping our code running than we are with whether or not our logs get logged.

We can set_failure_mode to “warn” to instead raise warnings (via the builtin warnings.warn) whenever rubicon-ml encounters an exception and continue execution of the offending code.

[2]:
from rubicon_ml import set_failure_mode


set_failure_mode("warn")

rb.get_project(name="failure modes")
/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:48: UserWarning: Traceback (most recent call last):
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
    project = self.repository.get_project(name)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
    raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.

  warnings.warn(traceback.format_exc(limit=TRACEBACK_LIMIT, chain=TRACEBACK_CHAIN))

We can also set_failure_mode to “log” to log the error with the builtin logging.error.

[3]:
set_failure_mode("log")

rb.get_project(name="failure modes")
ERROR:root:Traceback (most recent call last):
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
    project = self.repository.get_project(name)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
    raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.

set_failure_mode back to the default - “raise” - to return to raising the exceptions.

Log and warning verbosity#

The “log” and “warn” failure modes leverage the builtin traceback.exc_info() in order to print an error’s traceback. set_failure_mode’s traceback_limit and traceback_chain are passed directly through to the underlying call to traceback.exc_info() as the limit and chain arguments.

limit is an integer between 0 and the depth of the stack trace that controls the verbosity of the trace.

[4]:
set_failure_mode("log", traceback_limit=0)

rb.get_project(name="failure modes")
ERROR:root:rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.

chain can be set to True to see the full chain of exceptions rather than just the final exception in the chain.

[5]:
set_failure_mode("log", traceback_chain=True)

rb.get_project(name="failure modes")
ERROR:root:Traceback (most recent call last):
  File "/Users/nvd215/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py", line 213, in cat_file
    return bytes(self.store[path].getbuffer()[start:end])
KeyError: '/root/failure-modes/metadata.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 102, in get_project
    project = json.loads(self.filesystem.cat(project_metadata_path))
  File "/Users/nvd215/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/spec.py", line 755, in cat
    return self.cat_file(paths[0], **kwargs)
  File "/Users/nvd215/mambaforge/envs/rubicon-ml-dev/lib/python3.10/site-packages/fsspec/implementations/memory.py", line 215, in cat_file
    raise FileNotFoundError(path)
FileNotFoundError: /root/failure-modes/metadata.json

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
    project = self.repository.get_project(name)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
    raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes' found.

Caution with return values#

Some workflows require the returned rubicon-ml objects to be leveraged for future logging in the same process. For example, let’s finally create the “failure modes” project and take a look at the returned rubicon-ml object.

[6]:
rb.create_project(name="failure modes")
project = rb.get_project(name="failure modes")

print(project)
Project(name='failure modes', id='8d6ac09b-45c5-4097-9728-302c481a1665', description=None, github_url=None, training_metadata=None, created_at=datetime.datetime(2022, 11, 15, 17, 5, 2, 510729))

Now we can take any standard action on the returned rubicon-ml object, like inspecting its ID.

[7]:
print(project.id)
8d6ac09b-45c5-4097-9728-302c481a1665

If we were to leverage either the “log” or “warn” failure mode, which does not stop execution for rubicon-ml errors, we need to be cautious of returned rubicon-ml objects.

Now we’ll try to get another project that doesn’t exist. Even though this code will not stop execution after the failed get_project call, we need to be aware that a rubicon-ml project will not be returned in this case. A None will be returned in its place, thus any action taken on this returned None may fail if only rubicon-ml objects are expected downstream.

[8]:
set_failure_mode("log")

project = rb.get_project(name="failure modes v2")
print(project)
ERROR:root:Traceback (most recent call last):
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/rubicon.py", line 123, in get_project
    project = self.repository.get_project(name)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 104, in get_project
    raise RubiconException(f"No project with name '{project_name}' found.")
rubicon_ml.exceptions.RubiconException: No project with name 'failure modes v2' found.

None

When leveraging failure modes that do not interrupt execution, it is important to check the types of the objects returned from rubicon-ml. Simply trying to access the id attribute in this case would result in an AttributeError as a NoneType object hs no attribute id.

[9]:
if project is not None:
    print(project.id)

A more practical use case#

Let’s take a look at how this may work in a more practical machine learning workflow. For this example, we’ll train a k-neighbors classifier from Scikit-learn and attempt to log the input parameters and the score the trained model produces on a test dataset.

First, we’ll create a new project and experiment to attempt to log our inputs and outputs to.

[10]:
project = rb.create_project(name="failure modes v3")
experiment = project.log_experiment()

experiment
[10]:
<rubicon_ml.client.experiment.Experiment at 0x15cacce20>

Now that we’ve got an experiment, lets replace rubicon-ml’s filesystem (the part of the library that handles actual filesystem operations) with a no-op class representing a broken filesystem. Imagine this simulating something like losing connection to S3.

We’ll also set the failure mode back to raise for this first execution of our workflow with the broken filesystem.

[11]:
class BrokenFilesystem:
    pass

rb.config.repository.filesystem = BrokenFilesystem()

set_failure_mode("raise")

Now, notice that when we run the cell below, we only fit the model before execution is halted due to experiment.log_parameter raising an exception because of the broken filesystem. We never get a score, and it is never displayed.

[12]:
from sklearn.neighbors import KNeighborsClassifier


knn = KNeighborsClassifier(n_neighbors=1)
X_train, y_train, X_test, y_test = [[0, 1, 2, 3]], [0], [[0, 1, 2, 3]], [0]

knn.fit(X_train, y_train)
experiment.log_parameter(name="n_neighbors", value=1)

score = knn.score(X_test, y_test)
experiment.log_metric(name="score", value=score)

score
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [12], line 8
      5 X_train, y_train, X_test, y_test = [[0, 1, 2, 3]], [0], [[0, 1, 2, 3]], [0]
      7 knn.fit(X_train, y_train)
----> 8 experiment.log_parameter(name="n_neighbors", value=1)
     10 score = knn.score(X_test, y_test)
     11 experiment.log_metric(name="score", value=score)

File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:46, in failsafe.<locals>.wrapper(*args, **kwargs)
     44 except Exception as e:
     45     if FAILURE_MODE == "raise":
---> 46         raise e
     47     elif FAILURE_MODE == "warn":
     48         warnings.warn(traceback.format_exc(limit=TRACEBACK_LIMIT, chain=TRACEBACK_CHAIN))

File ~/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py:43, in failsafe.<locals>.wrapper(*args, **kwargs)
     41 def wrapper(*args, **kwargs):
     42     try:
---> 43         return func(*args, **kwargs)
     44     except Exception as e:
     45         if FAILURE_MODE == "raise":

File ~/github/capitalone/rubicon-ml/rubicon_ml/client/experiment.py:237, in Experiment.log_parameter(self, name, value, description, tags)
    214 """Create a parameter under the experiment.
    215
    216 Parameters
   (...)
    234     The created parameter.
    235 """
    236 parameter = domain.Parameter(name, value=value, description=description, tags=tags)
--> 237 self.repository.create_parameter(parameter, self.project.name, self.id)
    239 return Parameter(parameter, self)

File ~/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py:852, in BaseRepository.create_parameter(self, parameter, project_name, experiment_id)
    836 """Persist a parameter to the configured filesystem.
    837
    838 Parameters
   (...)
    846     The ID of the experiment this parameter belongs to.
    847 """
    848 parameter_metadata_path = self._get_parameter_metadata_path(
    849     project_name, experiment_id, parameter.name
    850 )
--> 852 if self.filesystem.exists(parameter_metadata_path):
    853     raise RubiconException(f"A parameter with name '{parameter.name}' already exists.")
    855 self._persist_domain(parameter, parameter_metadata_path)

AttributeError: 'BrokenFilesystem' object has no attribute 'exists'

By setting the failure mode to “log” and ensuring we are not using any unchecked objects returned by rubicon-ml, we can ensure that the entire workflow is completed regardless of whether or not the filesystem is working.

[13]:
set_failure_mode("log")

knn = KNeighborsClassifier(n_neighbors=1)
X_train, y_train, X_test, y_test = [[0, 1, 2, 3]], [0], [[0, 1, 2, 3]], [0]

knn.fit(X_train, y_train)
experiment.log_parameter(name="n_neighbors", value=1)

score = knn.score(X_test, y_test)
experiment.log_metric(name="score", value=score)

score
ERROR:root:Traceback (most recent call last):
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/experiment.py", line 237, in log_parameter
    self.repository.create_parameter(parameter, self.project.name, self.id)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 852, in create_parameter
    if self.filesystem.exists(parameter_metadata_path):
AttributeError: 'BrokenFilesystem' object has no attribute 'exists'

ERROR:root:Traceback (most recent call last):
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/utils/exception_handling.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/client/experiment.py", line 76, in log_metric
    self.repository.create_metric(metric, self.project.name, self.id)
  File "/Users/nvd215/github/capitalone/rubicon-ml/rubicon_ml/repository/base.py", line 746, in create_metric
    if self.filesystem.exists(metric_metadata_path):
AttributeError: 'BrokenFilesystem' object has no attribute 'exists'

[13]:
1.0