API Reference¶

Rubicon¶

class rubicon_ml.Rubicon(persistence: str | None = 'filesystem', root_dir: str | None = None, auto_git_enabled: bool = False, composite_config: List[Dict[str, Any]] | None = None, **storage_options)¶

The rubicon client’s entry point.

Creates a Config and injects it into the client level objects at run-time.

Parameters:

persistencestr, optional: The persistence type. Can be one of [“filesystem”, “memory”]. Defaults to “filesystem”.
root_dirstr, optional: Absolute or relative filepath. Use absolute path for best performance. Defaults to the local filesystem. Prefix with s3:// to use s3 instead.
auto_git_enabledbool, optional: True to use the git command to automatically log relevant repository information to projects and experiments logged with this client instance, False otherwise. Defaults to False.
storage_optionsdict, optional: Additional keyword arguments specific to the protocol being chosen. They are passed directly to the underlying filesystem class.

property config¶

Returns a single config.

Exists to promote backwards compatibility.

Returns:

Config: A single Config

create_project(name: str, description: str | None = None, github_url: str | None = None, training_metadata: List[Tuple] | Tuple | None = None) → Project¶

Create a project.

Parameters:

namestr: The project’s name.
descriptionstr, optional: The project’s description.
github_urlstr, optional: The URL of the GitHub repository associated with this project. If omitted and automatic git logging is enabled, it will be retrieved via git remote.
training_metadatatuple or list of tuples, optional: Metadata associated with the training dataset(s) used across each experiment in this project.

Returns:

rubicon.client.Project: The created project.

get_or_create_project(name: str, **kwargs) → Project¶

Get or create a project.

Parameters:

namestr: The project’s name.
kwargsdict: Additional keyword arguments to be passed to Rubicon.create_project.

Returns:

rubicon.client.Project: The corresponding project.

get_project(name: str | None = None, id: str | None = None) → Project¶

Get a project.

Parameters:

namestr, optional: The name of the project to get.
idstr, optional: The id of the project to get.

Returns:

rubicon.client.Project: The project with name name or id id.

get_project_as_dask_df(name, group_by=None)¶: DEPRECATED: Available for backwards compatibility.

get_project_as_df(name, df_type='pandas', group_by=None)¶

Get a dask or pandas dataframe representation of a project.

Parameters:

namestr: The name of the project to get.
df_typestr, optional: The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.
group_bystr or None, optional: How to group the project’s experiments in the returned DataFrame(s). Valid options include [“commit_hash”].

Returns:

pandas.DataFrame or list of pandas.DataFrame or dask.DataFrame or list of dask.DataFrame: If group_by is None, a dask or pandas dataframe holding the project’s data. Otherwise a list of dask or pandas dataframes holding the project’s data grouped by group_by.

is_auto_git_enabled() → bool¶: Check if git is enabled for any of the configs.

projects()¶

Get a list of available projects.

Returns:

list of rubicon.client.Project: The list of available projects.

sync(project_name: str, s3_root_dir: str, aws_profile: str | None = None, aws_shared_credentials_file: str | None = None)¶

Sync a local project to S3.

Parameters:

project_namestr: The name of the project to sync.
s3_root_dirstr: The S3 path where the project’s data will be synced.
aws_profilestr: Specifies the name of the AWS CLI profile with the credentials and options to use. Defaults to None, using the AWS default name ‘default’.
aws_shared_credentials_filestr: Specifies the location of the file that the AWS CLI uses to store access keys. Defaults to None, using the AWS default path ‘~/.aws/credentials’.

Notes

Use sync to backup your local project data to S3 as an alternative to direct S3 logging. Leverages the AWS CLI’s aws s3 sync. Ensure that any credentials are set and that any proxies are enabled.

Project¶

class rubicon_ml.Project(domain: ProjectDomain, config: Config | List[Config] | None = None)¶

A client project.

A project is a collection of experiments, dataframes, and artifacts identified by a unique name.

Parameters:

domainrubicon.domain.Project: The project domain model.
configrubicon.client.Config: The config, which specifies the underlying repository.

archive(experiments: List[Experiment] | None = None, remote_rubicon: Rubicon | None = None)¶

Archive the experiments logged to this project.

Parameters:

experimentslist of Experiments, optional: The rubicon.client.Experiment objects to archive. If None all logged experiments are archived.
remote_rubiconrubicon_ml.Rubicon object, optional: The remote Rubicon object with the repository to archive to

Returns:

filepath of newly created archive

artifact(name: str | None = None, id: str | None = None) → Artifact¶

Get an artifact logged to this project by id or name.

Parameters:

idstr: The id of the artifact to get.
namestr: The name of the artifact to get.

Returns:

rubicon.client.Artifact: The artifact logged to this project with id id or name ‘name’.

artifacts(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') → List[Artifact]¶

Get the artifacts logged to this client object.

Parameters:

namestr, optional: The name value to filter results on.
tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:

list of rubicon.client.Artifact: The artifacts previously logged to this client object.

property created_at¶: Get the time the project was created.

dataframe(name: str | None = None, id: str | None = None) → Dataframe¶

Get the dataframe logged to this client object.

Parameters:

idstr: The id of the dataframe to get.
namestr: The name of the dataframe to get.
Returns
——-
rubicon.client.Dataframe: The dataframe logged to this project with id id or name ‘name’.

dataframes(tags: List[str] | None = None, qtype: str = 'or', recursive: bool = False, name: str | None = None) → List[Dataframe]¶

Get the dataframes logged to this project.

Parameters:

tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
recursivebool, optional: If true, get the dataframes logged to this project’s experiments as well. Defaults to false.
namestr: The name value to filter results on.

Returns:

list of rubicon.client.Dataframe: The dataframes previously logged to this client object.

delete_artifacts(ids: List[str])¶

Delete the artifacts logged to with client object with ids ids.

Parameters:

idslist of str: The ids of the artifacts to delete.

delete_dataframes(ids: List[str])¶

Delete the dataframes with ids ids logged to this client object.

Parameters:

idslist of str: The ids of the dataframes to delete.

property description¶: Get the project’s description.

experiment(id: str | None = None, name: str | None = None) → Experiment¶

Get an experiment logged to this project by id or name.

Parameters:

idstr: The id of the experiment to get.
namestr: The name of the experiment to get.

Returns:

rubicon.client.Experiment: The experiment logged to this project with id id or name ‘name’.

experiments(tags: List[str] | None = None, qtype: str = 'or', name: str | None = None) → List[Experiment]¶

Get the experiments logged to this project.

Parameters:

tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
name:: The name of the experiment(s) to filter results on.

Returns:

list of rubicon.client.Experiment: The experiments previously logged to this project.

experiments_from_archive(remote_rubicon, latest_only: bool | None = False)¶

Retrieve archived experiments into this project’s experiments folder.

Parameters:

remote_rubiconrubicon_ml.Rubicon object: The remote Rubicon object with the repository containing archived experiments to read in
latest_onlybool, optional: Indicates whether or not experiments should only be read from the latest archive

property github_url¶: Get the project’s GitHub repository URL.

property id¶: Get the project’s id.

is_auto_git_enabled() → bool¶: Is git enabled for any of the configs.

Log an artifact to this client object.

Parameters:

data_bytesbytes, optional: The raw bytes to log as an artifact.
data_directorystr, optional: The path to a directory to zip and log as an artifact.
data_fileTextIOWrapper, optional: The open file to log as an artifact.
data_objectpython object, optional: The python object to log as an artifact.
data_pathstr, optional: The absolute or relative local path or S3 path to the data to log as an artifact. S3 paths must be prepended with ‘s3://’.
namestr, optional: The name of the artifact file. Required if data_path is not provided.
descriptionstr, optional: A description of the artifact. Use to provide additional context.
tagslist of str, optional: Values to tag the experiment with. Use tags to organize and filter your artifacts.
commentslist of str, optional: Values to comment the experiment with. Use comments to organize and filter your artifacts.

Returns:

rubicon.client.Artifact: The new artifact.

Notes

Only one of data_bytes, data_file, data_object, and data_path should be provided. If more than one is given, the order of precedence is data_bytes, data_object, data_file, data_path.

Examples

>>> # Log with bytes
>>> experiment.log_artifact(
...     data_bytes=b'hello rubicon!',
...     name="bytes_artifact",
...     description="log artifact from bytes",
... )

>>> # Log zipped directory
>>> experiment.log_artifact(
...     data_directory="./path/to/directory/",
...     name="directory.zip",
...     description="log artifact from zipped directory",
... )

>>> # Log with file
>>> with open('./path/to/artifact.txt', 'rb') as file:
>>>     project.log_artifact(
...         data_file=file,
...         name="file_artifact",
...         description="log artifact from file",
...     )

>>> # Log with file path
>>> experiment.log_artifact(
...     data_path="./path/to/artifact.pkl",
...     description="log artifact from file path",
... )

log_conda_environment(artifact_name: str | None = None) → Artifact¶

Log the conda environment as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:

artifact_namestr, optional: The name of the artifact (the exported conda environment).

Returns:

rubicon.client.Artifact: The new artifact.

Notes

Relies on running with an active conda environment.

Log a dataframe to this client object.

Parameters:

dfpandas.DataFrame, dask.dataframe.DataFrame, or polars DataFrame: The dataframe to log.
descriptionstr, optional: The dataframe’s description. Use to provide additional context.
tagslist of str: The values to tag the dataframe with.
comments: list of str: The values to comment the dataframe with.

Returns:

rubicon.client.Dataframe: The new dataframe.

Log a new experiment to this project.

Parameters:

namestr: The experiment’s name.
descriptionstr, optional: The experiment’s description. Use to provide additional context.
model_namestr, optional: The experiment’s model name. For example, this could be the name of the registered model in Model One.
branch_namestr, optional: The name of the active branch of the git repo this experiment is logged from. If omitted and automatic git logging is enabled, it will be retrieved via git rev-parse.
commit_hashstr, optional: The hash of the last commit to the active branch of the git repo this experiment is logged from. If omitted and automatic git logging is enabled, it will be retrieved via git rev-parse.
training_metadatatuple or list of tuples, optional: Metadata associated with the experiment’s training dataset(s).
tagslist of str, optional: Values to tag the experiment with. Use tags to organize and filter your experiments. For example, tags could be used to differentiate between the type of model or classifier used during the experiment (i.e. linear regression or random forest).
commentslist of str, optional: Values to comment the experiment with.

Returns:

rubicon.client.Experiment: The created experiment.

log_h2o_model(h2o_model, artifact_name: str | None = None, export_cross_validation_predictions: bool = False, use_mojo: bool = False, **log_artifact_kwargs) → Artifact¶

Log an h2o model as an artifact using h2o.save_model.

Parameters:

h2o_modelh2o.model.ModelBase: The h2o model to log as an artifact.
artifact_namestr, optional (default None): The name of the artifact. Defaults to None, using h2o_model’s class name.
export_cross_validation_predictions: bool, optional (default False): Passed directly to h2o.save_model.
use_mojo: bool, optional (default False): Whether to log the model in MOJO format. If False, the model will be logged in binary format.
log_artifact_kwargsdict: Additional kwargs to be passed directly to self.log_artifact.

log_json(json_object: Dict[str, Any], name: str | None = None, description: str | None = None, tags: List[str] | None = None) → Artifact¶

Log a python dictionary to a JSON file.

Parameters:

json_objectDict[str, Any]: A python dictionary capable of being converted to JSON.
nameOptional[str], optional: A name for this JSON file, by default None
descriptionOptional[str], optional: A description for this file, by default None
tagsOptional[List[str]], optional: Any Rubicon tags, by default None

Returns:

Artifact: The new artifact.

log_pip_requirements(artifact_name: str | None = None) → Artifact¶

Log the pip requirements as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:

artifact_namestr, optional: The name of the artifact (the exported pip environment).

Returns:

rubicon.client.Artifact: The new artifact.

log_with_schema(obj: Any, experiment: Experiment | None = None, experiment_kwargs: Dict[str, Any] | None = None) → Any¶: Log an experiment leveraging self.schema_.

log_xgboost_model(xgboost_model: xgb.Booster, artifact_name: str | None = None, **log_artifact_kwargs: Any) → Artifact¶

Log an XGBoost model as a JSON file to this client object.

Please note that we do not currently support logging directly from the SKLearn interface.

Parameters:

xgboost_model: Booster: An xgboost model object in the Booster format
artifact_namestr, optional: The name of the artifact (the exported XGBoost model).
log_artifact_kwargsAny: Additional kwargs to be passed directly to self.log_artifact.

Returns:

rubicon.client.Artifact: The new artifact.

property name¶: Get the project’s name.

property repositories: List[BaseRepository] | None¶: Get all repositories.

property repository: BaseRepository | None¶: Get the repository.

set_schema(schema: Dict[str, Any]) → None¶: Set the schema for this client object.

to_dask_df(group_by: str | None = None)¶: DEPRECATED: Available for backwards compatibility.

to_df(df_type: str = 'pandas', group_by: str | None = None) → pd.DataFrame | Dict[str, pd.DataFrame] | dd.DataFrame | Dict[str, dd.DataFrame]¶

Loads the project’s data into dask or pandas dataframe(s) sorted by created_at. This includes the experiment details along with parameters and metrics.

Parameters:

df_typestr, optional: The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.
group_bystr or None, optional: How to group the project’s experiments in the returned dataframe(s). Valid options include [“commit_hash”].

Returns:

pandas.DataFrame or dict of pandas.DataFrame or dask.DataFrame or dict of dask.DataFrame: If group_by is None, a dask or pandas dataframe holding the project’s data. Otherwise a dict of dask or pandas dataframes holding the project’s data grouped by group_by.

property training_metadata¶: Get the project’s training metadata.

Experiment¶

class rubicon_ml.Experiment(domain: ExperimentDomain, parent: Project)¶

A client experiment.

An experiment represents a model run and is identified by its ‘created_at’ time. It can have metrics, parameters, features, dataframes, and artifacts logged to it.

An experiment is logged to a project.

Parameters:

domainrubicon.domain.Experiment: The experiment domain model.
parentrubicon.client.Project: The project that the experiment is logged to.

add_child_experiment(experiment: Experiment)¶

Add tags to denote experiment as a descendent of this experiment.

Parameters:

experimentrubicon_ml.client.Experiment: The experiment to mark as a descendent of this experiment.

Raises:

RubiconException: If experiment and this experiment are not logged to the same project.

add_comments(comments: List[str])¶

Add comments to this client object.

Parameters:

commentslist of str: The comment values to add.

add_tags(tags: List[str])¶

Add tags to this client object.

Parameters:

tagslist of str: The tag values to add.

artifact(name: str | None = None, id: str | None = None) → Artifact¶

Get an artifact logged to this project by id or name.

Parameters:

idstr: The id of the artifact to get.
namestr: The name of the artifact to get.

Returns:

rubicon.client.Artifact: The artifact logged to this project with id id or name ‘name’.

artifacts(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') → List[Artifact]¶

Get the artifacts logged to this client object.

Parameters:

namestr, optional: The name value to filter results on.
tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:

list of rubicon.client.Artifact: The artifacts previously logged to this client object.

property branch_name¶: Get the experiment’s branch name.

property comments: List[str]¶: Get this client object’s comments.

property commit_hash¶: Get the experiment’s commit hash.

property created_at¶: Get the time the experiment was created.

dataframe(name: str | None = None, id: str | None = None) → Dataframe¶

Get the dataframe logged to this client object.

Parameters:

idstr: The id of the dataframe to get.
namestr: The name of the dataframe to get.
Returns
——-
rubicon.client.Dataframe: The dataframe logged to this project with id id or name ‘name’.

dataframes(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') → List[Dataframe]¶

Get the dataframes logged to this client object.

Parameters:

namestr, optional: The name value to filter results on.
tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:

list of rubicon.client.Dataframe: The dataframes previously logged to this client object.

delete_artifacts(ids: List[str])¶

Delete the artifacts logged to with client object with ids ids.

Parameters:

idslist of str: The ids of the artifacts to delete.

delete_dataframes(ids: List[str])¶

Delete the dataframes with ids ids logged to this client object.

Parameters:

idslist of str: The ids of the dataframes to delete.

property description¶: Get the experiment’s description.

feature(name=None, id=None)¶

Get a feature.

Parameters:

namestr, optional: The name of the feature to get.
idstr, optional: The id of the feature to get.

Returns:

rubicon.client.Feature: The feature with name name or id id.

features(name=None, tags=[], qtype='or')¶

Get the features logged to this experiment.

Parameters:

namestr, optional: The name value to filter results on.
tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:

list of rubicon.client.Feature: The features previously logged to this experiment.

get_child_experiments() → List[Experiment]¶

Get the experiments that are tagged as children of this experiment.

Returns:

list of rubicon_ml.client.Experiment: The experiments that are tagged as children of this experiment.

get_parent_experiments() → List[Experiment]¶

Get the experiments that are tagged as parents of this experiment.

Returns:

list of rubicon_ml.client.Experiment: The experiments that are tagged as parents of this experiment.

property id¶: Get the experiment’s id.

is_auto_git_enabled() → bool¶: Is git enabled for any of the configs.

Log an artifact to this client object.

Parameters:

data_bytesbytes, optional: The raw bytes to log as an artifact.
data_directorystr, optional: The path to a directory to zip and log as an artifact.
data_fileTextIOWrapper, optional: The open file to log as an artifact.
data_objectpython object, optional: The python object to log as an artifact.
data_pathstr, optional: The absolute or relative local path or S3 path to the data to log as an artifact. S3 paths must be prepended with ‘s3://’.
namestr, optional: The name of the artifact file. Required if data_path is not provided.
descriptionstr, optional: A description of the artifact. Use to provide additional context.
tagslist of str, optional: Values to tag the experiment with. Use tags to organize and filter your artifacts.
commentslist of str, optional: Values to comment the experiment with. Use comments to organize and filter your artifacts.

Returns:

rubicon.client.Artifact: The new artifact.

Notes

Only one of data_bytes, data_file, data_object, and data_path should be provided. If more than one is given, the order of precedence is data_bytes, data_object, data_file, data_path.

Examples

>>> # Log with bytes
>>> experiment.log_artifact(
...     data_bytes=b'hello rubicon!',
...     name="bytes_artifact",
...     description="log artifact from bytes",
... )

>>> # Log zipped directory
>>> experiment.log_artifact(
...     data_directory="./path/to/directory/",
...     name="directory.zip",
...     description="log artifact from zipped directory",
... )

>>> # Log with file
>>> with open('./path/to/artifact.txt', 'rb') as file:
>>>     project.log_artifact(
...         data_file=file,
...         name="file_artifact",
...         description="log artifact from file",
...     )

>>> # Log with file path
>>> experiment.log_artifact(
...     data_path="./path/to/artifact.pkl",
...     description="log artifact from file path",
... )

log_conda_environment(artifact_name: str | None = None) → Artifact¶

Log the conda environment as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:

artifact_namestr, optional: The name of the artifact (the exported conda environment).

Returns:

rubicon.client.Artifact: The new artifact.

Notes

Relies on running with an active conda environment.

Log a dataframe to this client object.

Parameters:

dfpandas.DataFrame, dask.dataframe.DataFrame, or polars DataFrame: The dataframe to log.
descriptionstr, optional: The dataframe’s description. Use to provide additional context.
tagslist of str: The values to tag the dataframe with.
comments: list of str: The values to comment the dataframe with.

Returns:

rubicon.client.Dataframe: The new dataframe.

log_feature(name: str, description: str = None, importance: float = None, tags: list[str] = [], comments: list[str] = []) → Feature¶

Create a feature under the experiment.

Parameters:

namestr: The features’s name.
descriptionstr: The feature’s description. Use to provide additional context.
importancefloat: The feature’s importance.
tagslist of str, optional: Values to tag the experiment with. Use tags to organize and filter your features.
commentslist of str, optional: Values to comment the experiment with. Use comments to organize and filter your features.

Returns:

rubicon.client.Feature: The created feature.

log_h2o_model(h2o_model, artifact_name: str | None = None, export_cross_validation_predictions: bool = False, use_mojo: bool = False, **log_artifact_kwargs) → Artifact¶

Log an h2o model as an artifact using h2o.save_model.

Parameters:

h2o_modelh2o.model.ModelBase: The h2o model to log as an artifact.
artifact_namestr, optional (default None): The name of the artifact. Defaults to None, using h2o_model’s class name.
export_cross_validation_predictions: bool, optional (default False): Passed directly to h2o.save_model.
use_mojo: bool, optional (default False): Whether to log the model in MOJO format. If False, the model will be logged in binary format.
log_artifact_kwargsdict: Additional kwargs to be passed directly to self.log_artifact.

log_json(json_object: Dict[str, Any], name: str | None = None, description: str | None = None, tags: List[str] | None = None) → Artifact¶

Log a python dictionary to a JSON file.

Parameters:

json_objectDict[str, Any]: A python dictionary capable of being converted to JSON.
nameOptional[str], optional: A name for this JSON file, by default None
descriptionOptional[str], optional: A description for this file, by default None
tagsOptional[List[str]], optional: Any Rubicon tags, by default None

Returns:

Artifact: The new artifact.

log_metric(name: str, value: float, directionality: str = 'score', description: str = None, tags: list[str] = [], comments: list[str] = []) → Metric¶

Create a metric under the experiment.

Parameters:

namestr: The metric’s name.
valuefloat: The metric’s value.
directionalitystr, optional: The metric’s directionality. Must be one of [“score”, “loss”], where “score” represents a metric to maximize, while “loss” represents a metric to minimize. Defaults to “score”.
descriptionstr, optional: The metric’s description. Use to provide additional context.
tagslist of str, optional: Values to tag the experiment with. Use tags to organize and filter your metrics.
commentslist of str, optional: Values to comment the experiment with. Use comments to organize and filter your metrics.

Returns:

rubicon.client.Metric: The created metric.

log_parameter(name: str, value: object = None, description: str = None, tags: list[str] = [], comments: list[str] = []) → Parameter¶

Create a parameter under the experiment.

Parameters:

namestr: The parameter’s name.
valueobject, optional: The parameter’s value. Can be an object of any JSON serializable (via rubicon.utils.DomainJSONEncoder) type.
descriptionstr, optional: The parameter’s description. Use to provide additional context.
tagslist of str, optional: Values to tag the parameter with. Use tags to organize and filter your parameters.
commentslist of str, optional: Values to comment the experiment with. Use comments to organize and filter your features.

Returns:

rubicon.client.Parameter: The created parameter.

log_pip_requirements(artifact_name: str | None = None) → Artifact¶

Log the pip requirements as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:

artifact_namestr, optional: The name of the artifact (the exported pip environment).

Returns:

rubicon.client.Artifact: The new artifact.

log_xgboost_model(xgboost_model: xgb.Booster, artifact_name: str | None = None, **log_artifact_kwargs: Any) → Artifact¶

Log an XGBoost model as a JSON file to this client object.

Please note that we do not currently support logging directly from the SKLearn interface.

Parameters:

xgboost_model: Booster: An xgboost model object in the Booster format
artifact_namestr, optional: The name of the artifact (the exported XGBoost model).
log_artifact_kwargsAny: Additional kwargs to be passed directly to self.log_artifact.

Returns:

rubicon.client.Artifact: The new artifact.

metric(name=None, id=None)¶

Get a metric.

Parameters:

namestr, optional: The name of the metric to get.
idstr, optional: The id of the metric to get.

Returns:

rubicon.client.Metric: The metric with name name or id id.

metrics(name=None, tags=[], qtype='or')¶

Get the metrics logged to this experiment.

Parameters:

namestr, optional: The name value to filter results on.
tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:

list of rubicon.client.Metric: The metrics previously logged to this experiment.

property model_name¶: Get the experiment’s model name.

property name¶: Get the experiment’s name.

parameter(name=None, id=None)¶

Get a parameter.

Parameters:

namestr, optional: The name of the parameter to get.
idstr, optional: The id of the parameter to get.

Returns:

rubicon.client.Parameter: The parameter with name name or id id.

parameters(name=None, tags=[], qtype='or')¶

Get the parameters logged to this experiment.

Parameters:

namestr, optional: The name value to filter results on.
tagslist of str, optional: The tag values to filter results on.
qtypestr, optional: The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:

list of rubicon.client.Parameter: The parameters previously logged to this experiment.

property project¶: Get the project client object that this experiment belongs to.

remove_comments(comments: List[str])¶

Remove comments from this client object.

Parameters:

commentslist of str: The comment values to remove.

remove_tags(tags: List[str])¶

Remove tags from this client object.

Parameters:

tagslist of str: The tag values to remove.

property repositories: List[BaseRepository] | None¶: Get all repositories.

property repository: BaseRepository | None¶: Get the repository.

property tags: TagContainer¶: Get this client object’s tags.

property training_metadata¶: Get the project’s training metadata.

Parameter¶

class rubicon_ml.Parameter(domain: ParameterDomain, parent: Experiment)¶

A client parameter.

A parameter is an input to an experiment (model run) that depends on the type of model being used. It affects the model’s predictions.

For example, if you were using a random forest classifier, ‘n_estimators’ (the number of trees in the forest) could be a parameter.

A parameter is logged to an experiment.

Parameters:

domainrubicon.domain.Parameter: The parameter domain model.
parentrubicon.client.Experiment: The experiment that the parameter is logged to.

add_comments(comments: List[str])¶

Add comments to this client object.

Parameters:

commentslist of str: The comment values to add.

add_tags(tags: List[str])¶

Add tags to this client object.

Parameters:

tagslist of str: The tag values to add.

property comments: List[str]¶: Get this client object’s comments.

property created_at: datetime¶: Get the time the parameter was created.

property description: str | None¶: Get the parameter’s description.

property id: str¶: Get the parameter’s id.

is_auto_git_enabled() → bool¶: Is git enabled for any of the configs.

property name: str | None¶: Get the parameter’s name.

property parent: Experiment¶: Get the parameter’s parent client object.

remove_comments(comments: List[str])¶

Remove comments from this client object.

Parameters:

commentslist of str: The comment values to remove.

remove_tags(tags: List[str])¶

Remove tags from this client object.

Parameters:

tagslist of str: The tag values to remove.

property repositories: List[BaseRepository] | None¶: Get all repositories.

property repository: BaseRepository | None¶: Get the repository.

property tags: TagContainer¶: Get this client object’s tags.

property value: object | float | None¶: Get the parameter’s value.

Feature¶

class rubicon_ml.Feature(domain: FeatureDomain, parent: Experiment)¶

A client feature.

A feature is an input to an experiment (model run) that’s an independent, measurable property of a phenomenon being observed. It affects the model’s predictions.

For example, consider a model that predicts how likely a customer is to pay back a loan. Possible features could be ‘year’, ‘credit score’, etc.

A feature is logged to an experiment.

Parameters:

domainrubicon.domain.Feature: The feature domain model.
configrubicon.client.Config: The config, which specifies the underlying repository.
parentrubicon.client.Experiment: The experiment that the feature is logged to.

add_comments(comments: List[str])¶

Add comments to this client object.

Parameters:

commentslist of str: The comment values to add.

add_tags(tags: List[str])¶

Add tags to this client object.

Parameters:

tagslist of str: The tag values to add.

property comments: List[str]¶: Get this client object’s comments.

property created_at: datetime¶: Get the feature’s created_at.

property description: str | None¶: Get the feature’s description.

property id: str¶: Get the feature’s id.

property importance¶: Get the feature’s importance.

is_auto_git_enabled() → bool¶: Is git enabled for any of the configs.

property name: str | None¶: Get the feature’s name.

property parent: Experiment¶: Get the feature’s parent client object.

remove_comments(comments: List[str])¶

Remove comments from this client object.

Parameters:

commentslist of str: The comment values to remove.

remove_tags(tags: List[str])¶

Remove tags from this client object.

Parameters:

tagslist of str: The tag values to remove.

property repositories: List[BaseRepository] | None¶: Get all repositories.

property repository: BaseRepository | None¶: Get the repository.

property tags: TagContainer¶: Get this client object’s tags.

Metric¶

class rubicon_ml.Metric(domain: MetricDomain, parent: Experiment)¶

A client metric.

A metric is a single-value output of an experiment that helps evaluate the quality of the model’s predictions.

It can be either a ‘score’ (value to maximize) or a ‘loss’ (value to minimize).

A metric is logged to an experiment.

Parameters:

domainrubicon.domain.Metric: The metric domain model.
parentrubicon.client.Experiment: The experiment that the metric is logged to.

add_comments(comments: List[str])¶

Add comments to this client object.

Parameters:

commentslist of str: The comment values to add.

add_tags(tags: List[str])¶

Add tags to this client object.

Parameters:

tagslist of str: The tag values to add.

property comments: List[str]¶: Get this client object’s comments.

property created_at: datetime¶: Get the metric’s created_at.

property description: str | None¶: Get the metric’s description.

property directionality: str¶: Get the metric’s directionality.

property id: str¶: Get the metric’s id.

is_auto_git_enabled() → bool¶: Is git enabled for any of the configs.

property name: str | None¶: Get the metric’s name.

property parent: Experiment¶: Get the metric’s parent client object.

remove_comments(comments: List[str])¶

Remove comments from this client object.

Parameters:

commentslist of str: The comment values to remove.

remove_tags(tags: List[str])¶

Remove tags from this client object.

Parameters:

tagslist of str: The tag values to remove.

property repositories: List[BaseRepository] | None¶: Get all repositories.

property repository: BaseRepository | None¶: Get the repository.

property tags: TagContainer¶: Get this client object’s tags.

property value¶: Get the metric’s value.

Dataframe¶

class rubicon_ml.Dataframe(domain: DataframeDomain, parent: Experiment | Project)¶

A client dataframe.

A dataframe is a two-dimensional, tabular dataset with labeled axes (rows and columns) that provides value to the model developer and/or reviewer when visualized.

For example, confusion matrices, feature importance tables and marginal residuals can all be logged as a dataframe.

A dataframe is logged to a project or an experiment.

Parameters:

domainrubicon.domain.Dataframe: The dataframe domain model.
parentrubicon.client.Project or rubicon.client.Experiment: The project or experiment that the artifact is logged to.

add_comments(comments: List[str])¶

Add comments to this client object.

Parameters:

commentslist of str: The comment values to add.

add_tags(tags: List[str])¶

Add tags to this client object.

Parameters:

tagslist of str: The tag values to add.

property comments: List[str]¶: Get this client object’s comments.

property created_at¶: Get the time this dataframe was created.

property description¶: Get the dataframe’s description.

get_data(df_type: Literal['pandas', 'dask'] = 'pandas')¶

Loads the data associated with this Dataframe into a pandas or dask dataframe.

Parameters:

df_typestr, optional: The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.

property id¶: Get the dataframe’s id.

is_auto_git_enabled() → bool¶: Is git enabled for any of the configs.

property name¶: Get the dataframe’s name.

property parent¶: Get the dataframe’s parent client object.

plot(df_type: Literal['pandas', 'dask'] = 'pandas', plotting_func: Callable | None = None, **kwargs)¶

Render the dataframe using plotly.express.

Parameters:

df_typestr, optional: The type of dataframe. Can be either pandas or dask. Defaults to ‘pandas’.
plotting_funcfunction, optional: The plotly.express plotting function used to visualize the dataframes. Available options can be found at https://plotly.com/python-api-reference/plotly.express.html. Defaults to plotly.express.line.
kwargsdict, optional: Keyword arguments to be passed to plotting_func. Available options can be found in the documentation of the individual functions at the URL above.

Examples

>>> # Log a line plot
>>> dataframe.plot(x='Year', y='Number of Subscriptions')

>>> # Log a bar plot
>>> import plotly.express as px
>>> dataframe.plot(plotting_func=px.bar, x='Year', y='Number of Subscriptions')

remove_comments(comments: List[str])¶

Remove comments from this client object.

Parameters:

commentslist of str: The comment values to remove.

remove_tags(tags: List[str])¶

Remove tags from this client object.

Parameters:

tagslist of str: The tag values to remove.

property repositories: List[BaseRepository] | None¶: Get all repositories.

property repository: BaseRepository | None¶: Get the repository.

property tags: TagContainer¶: Get this client object’s tags.

Artifact¶

class rubicon_ml.Artifact(domain: ArtifactDomain, parent: Project)¶

A client artifact.

An artifact is a catch-all for any other type of data that can be logged to a file.

For example, a snapshot of a trained model (.pkl) can be logged to the experiment created during its run. Or, a base model for the model in development can be logged to a project when leveraging transfer learning.

An artifact is logged to a project or an experiment.

Parameters:

domainrubicon.domain.Artifact: The artifact domain model.
parentrubicon.client.Project or rubicon.client.Experiment: The project or experiment that the artifact is logged to.

add_comments(comments: List[str])¶

Add comments to this client object.

Parameters:

commentslist of str: The comment values to add.

add_tags(tags: List[str])¶

Add tags to this client object.

Parameters:

tagslist of str: The tag values to add.

property comments: List[str]¶: Get this client object’s comments.

property created_at¶: Get the time this dataframe was created.

property data¶: Get the artifact’s raw data.

property description: str¶: Get the artifact’s description.

download(location: str | None = None, name: str | None = None, unzip: bool = False)¶

Download this artifact’s data.

Parameters:

locationstr, optional: The absolute or relative local directory or S3 bucket to download the artifact to. S3 buckets must be prepended with ‘s3://’. Defaults to the current local working directory.
namestr, optional: The name to give the downloaded artifact file. Defaults to the artifact’s given name when logged.
unzipbool, optional: True to unzip the artifact data. False otherwise. Defaults to False.

get_data(deserialize: Literal['h2o', 'h2o_binary', 'h2o_mojo', 'pickle', 'xgboost'] | None = None, unpickle: bool = False)¶

Loads the data associated with this artifact and unpickles if needed.

Parameters:

deseralizestr, optional: Method to use to deseralize this artifact’s data. * None to disable deseralization and return the raw data. * “h2o” or “h2o_binary” to use h2o.load_model to load the data. * “h2o_mojo” to use h2o.import_mojo to load the data. * “pickle” to use pickles to load the data. * “xgboost” to use xgboost’s JSON loader to load the data as a fitted model. Defaults to None.
unpicklebool, optional: Flag indicating whether or not to unpickle artifact data. deserialize takes precedence. Defaults to False. Deprecated: Please use deserialize=”pickle” in the future.

property id: str¶: Get the artifact’s id.

is_auto_git_enabled() → bool¶: Is git enabled for any of the configs.

property name: str¶: Get the artifact’s name.

property parent¶: Get the artifact’s parent client object.

remove_comments(comments: List[str])¶

Remove comments from this client object.

Parameters:

commentslist of str: The comment values to remove.

remove_tags(tags: List[str])¶

Remove tags from this client object.

Parameters:

tagslist of str: The tag values to remove.

property repositories: List[BaseRepository] | None¶: Get all repositories.

property repository: BaseRepository | None¶: Get the repository.

property tags: TagContainer¶: Get this client object’s tags.

temporary_download(unzip: bool = False)¶

Temporarily download this artifact’s data within a context manager.

Parameters:

unzipbool, optional: True to unzip the artifact data. False otherwise. Defaults to False.

Yields:

file: An open file pointer into the directory the artifact data was temporarily downloaded into. If the artifact is a single file, its name is stored in the artifact.name attribute.

exception_handling¶

rubicon_ml.set_failure_mode(failure_mode: str, traceback_chain: bool = False, traceback_limit: int | None = None) → None¶

Set the failure mode.

Parameters:

failure_modestr: The name of the failure mode to set. “raise” to raise all exceptions, “log” to catch all exceptions and log them via logging.error, “warn” to catch all exceptions and re-raise them as warnings via warnings.warn. Defaults to “raise”.
traceback_chainbool, optional: True to display each error in the traceback chain when logging or warning, False to display only the first. Defaults to False.
traceback_limitint, optional: The depth of the traceback displayed when logging or warning. 0 to display only the error’s text, each increment shows another line of the traceback.

publish¶

rubicon_ml leverages intake to easily share sets of experiments.

rubicon_ml.publish(experiments, visualization_object: ExperimentsTable | MetricCorrelationPlot | DataframePlot | MetricListsComparison | None = None, output_filepath=None, base_catalog_filepath=None)¶

Publish experiments to an intake catalog that can be read by the intake-rubicon driver.

Parameters:

experimentslist of rubicon_ml.client.experiment.Experiment: The experiments to publish.
output_filepathstr, optional: The absolute or relative local filepath or S3 bucket and key to log the generated YAML file to. S3 buckets must be prepended with ‘s3://’. Defaults to None, which disables writing the generated YAML.
base_catalog_filepathstr, optional: Similar to output_filepath except this argument is used as a base base file to update an existing intake catalog. Defaults to None, creating a new intake catalog.

Returns:

str: The YAML string representation of the intake catalog containing the experiments experiments.

RubiconJSON¶

class rubicon_ml.RubiconJSON(rubicon_objects: List[Rubicon] | None = None, projects: List[Project] | None = None, experiments: List[Experiment] | None = None)¶

RubiconJSON converts top-level rubicon_ml objects, projects, and experiments into a JSON structured dictionary for JSONPath-like querying with jsonpath-ng.

Parameters:

rubicon_objectsrubicon.client.Rubicon or list of type rubicon.client.Rubicon: Top-level rubicon-ml objects to convert to JSON for querying.
projectsrubicon.client.Project or list of type rubicon.client.Project: rubicon-ml projects to convert to JSON for querying.
experimentsrubicon.client.Experiment or list of type rubicon.client.Experiment: rubicon-ml experiments to convert to JSON for querying.

search(query: str)¶

Query the JSON generated from the RubiconJSON instantiation in a JSONPath-like manner. Can return queries as rubicon_ml.client objects by specifying return_type parameter. Will return as JSON structured dict by default.

Parameters:

query: JSONPath-like query

schema¶

Methods and a mixin to enable schema logging.

The functions available in the schema submodule are applied to rubicon_ml.Project via the SchemaMixin class. They can be called directly as a method of an existing project.

class rubicon_ml.schema.logger.SchemaMixin¶

Adds schema logging support to a client object.

log_with_schema(obj: Any, experiment: Experiment | None = None, experiment_kwargs: Dict[str, Any] | None = None) → Any¶: Log an experiment leveraging self.schema_.

set_schema(schema: Dict[str, Any]) → None¶: Set the schema for this client object.

Mehtods for interacting with the existing rubicon-ml schema.

rubicon_ml.schema.registry.available_schema() → List[str]¶: Get the names of all available schema.

rubicon_ml.schema.registry.get_schema(name: str) → Any¶: Get the schema with name name.

rubicon_ml.schema.registry.get_schema_name(obj: Any) → str¶: Get the name of the schema that represents object obj.

rubicon_ml.schema.registry.register_schema(name: str, schema: dict)¶: Add a schema to the schema registry.

sklearn¶

rubicon_ml offers direct integration with Scikit-learn via our own pipeline object.

class rubicon_ml.sklearn.RubiconPipeline(project, steps, user_defined_loggers={}, experiment_kwargs={'name': 'RubiconPipeline experiment'}, memory=None, verbose=False, ignore_warnings=False)¶

An extension of sklearn.pipeline.Pipeline that automatically creates a Rubicon experiment under the provided project and logs the pipeline’s parameters and metrics to it.

A single pipeline run will result in a single experiment logged with its corresponding parameters and metrics pulled from the pipeline’s estimators.

Parameters:

projectrubicon_ml.client.Project: The rubicon project to log to.
stepslist: List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
user_defined_loggersdict, optional: A dict mapping the estimator name to a corresponding user defined logger. See the example below for more details.
experiment_kwargsdict, optional: Additional keyword arguments to be passed to project.log_experiment().
memorystr or object with the joblib.Memory interface, default=None: Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. (docstring source: Scikit-Learn)
verbosebool, default=False: If True, the time elapsed while fitting each step will be printed as it is completed. (docstring source: Scikit-Learn)
ignore_warningsbool, default=False: If True, ignores warnings thrown by pipeline.

Examples

>>> pipeline = RubiconPipeline(
...     project,
...     [
...         ("vect", CountVectorizer()),
...         ("tfidf", TfidfTransformer()),
...         ("clf", SGDClassifier()),
...     ],
...     user_defined_loggers = {
...         "vect": FilterEstimatorLogger(
...             select=["input", "decode_error", "max_df"],
...         ),
...         "tfidf": FilterEstimatorLogger(ignore_all=True),
...         "clf": FilterEstimatorLogger(
...             ignore=["alpha", "penalty"],
...         ),
...     }
... )

fit(X, y=None, tags=None, log_fit_params=True, experiment=None, **fit_params)¶

Fit the model and automatically log the fit_params to rubicon-ml. Optionally, pass tags to update the experiment’s tags.

Parameters:

Xiterable: Training data. Must fulfill input requirements of first step of the pipeline.
yiterable, optional: Training targets. Must fulfill label requirements for all steps of the pipeline.
tagslist, optional: Additional tags to add to the experiment during the fit.
log_fit_paramsbool, optional: True to log the values passed as fit_params to this pipeline’s experiment. Defaults to True.
fit_paramsdict, optional: Additional keyword arguments to be passed to sklearn.pipeline.Pipeline.fit().
experiment: rubicon_ml.experiment.client.Experiment, optional: The experiment to log the to. If no experiment is provided the metrics are logged to a new experiment with self.experiment_kwargs.

Returns:

rubicon_ml.sklearn.Pipeline: This RubiconPipeline.

get_estimator_logger(step_name=None, estimator=None)¶: Get a logger for the estimator. By default, the logger will have the current experiment set.

score(X, y=None, sample_weight=None, experiment=None)¶

Score with the final estimator and automatically log the results to rubicon-ml.

Parameters:

Xiterable: Data to predict on. Must fulfill input requirements of first step of the pipeline.
yiterable, optional: Targets used for scoring. Must fulfill label requirements for all steps of the pipeline.
sample_weightlist, optional: If not None, this argument is passed as sample_weight keyword argument to the score method of the final estimator.
experiment: rubicon_ml.experiment.client.Experiment, optional: The experiment to log the score to. If no experiment is provided the score is logged to a new experiment with self.experiment_kwargs.

Returns:

float: Result of calling score on the final estimator.

score_samples(X, experiment=None)¶

Score samples with the final estimator and automatically log the results to rubicon-ml.

Parameters:

Xiterable: Data to predict on. Must fulfill input requirements of first step of the pipeline.
experiment: rubicon_ml.experiment.client.Experiment, optional: The experiment to log the score to. If no experiment is provided the score is logged to a new experiment with self.experiment_kwargs.

Returns:

ndarray of shape (n_samples,): Result of calling score_samples on the final estimator.

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

experimentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for experiment parameter in fit.
log_fit_paramsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for log_fit_params parameter in fit.
tagsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for tags parameter in fit.

Returns:

selfobject: The updated object.

set_score_request(*, experiment: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') → RubiconPipeline¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

experimentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for experiment parameter in score.
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns:

selfobject: The updated object.

class rubicon_ml.sklearn.FilterEstimatorLogger(estimator=None, experiment=None, step_name=None, select=[], ignore=[], ignore_all=False)¶

The filter logger for sklearn estimators. Use this logger to either select or ignore specific parameters for logging.

Parameters:

estimatora sklearn estimator, optional: The estimator
experimentrubicon.client.Experiment, optional: The experiment to log the parameters and metrics to.
step_namestr, optional: The name of the pipeline step.
selectlist, optional: The list of parameters on this estimator that you’d like to log. All other parameters will be ignored.
ignorelist, optional: The list of parameters on this estimator that you’d like to ignore by not logging. The other parameters will be logged.
ignore_allbool, optional: Ignore all parameters if true.

rubicon_ml.sklearn.pipeline.make_pipeline(project, *steps, experiment_kwargs={'name': 'RubiconPipeline experiment'}, memory=None, verbose=False)¶

Wrapper around RubicionPipeline(). Does not require naming for estimators. Their names are set to the lowercase strings of their types.

Parameters:

projectrubicon_ml.client.Project: The rubicon project to log to.
stepslist: List of estimator objects or (estimator, logger) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator. (docstring source: Scikit-Learn)
experiment_kwargsdict, optional: Additional keyword arguments to be passed to project.log_experiment().
memorystr or object with the joblib.Memory interface, default=None: Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. (docstring source: Scikit-Learn)
verbosebool, default=False: If True, the time elapsed while fitting each step will be printed as it is completed. (docstring source: Scikit-Learn)

Returns:

rubicon_ml.sklearn.Pipeline: A RubiconPipeline with project project and steps steps.

viz¶

rubicon_ml offers visualization leveraging Dash and Plotly. Each of the following classes are standalone widgets.

class rubicon_ml.viz.DataframePlot(dataframe_name, experiments=None, plotting_func=<function line>, plotting_func_kwargs={}, x=None, y=None)¶

Plot the dataframes with name dataframe_name logged to the experiments experiments on a shared axis.

Parameters:

dataframe_namestr: The name of the dataframe to plot. A dataframe with name dataframe_name must be logged to each experiment in experiments.
experimentslist of rubicon_ml.client.experiment.Experiment, optional: The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
plotting_funcfunction, optional: The plotly.express plotting function used to visualize the dataframes. Available options can be found at https://plotly.com/python-api-reference/plotly.express.html. Defaults to plotly.express.line.
plotting_func_kwargsdict, optional: Keyword arguments to be passed to plotting_func. Available options can be found in the documentation of the individual functions at the URL above.
xstr, optional: The name of the column in the dataframes with name dataframe_name to plot across the x-axis.
ystr, optional: The name of the column in the dataframes with name dataframe_name to plot across the y-axis.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶

Serve the Dash app on the next available port to render the visualization.

Parameters:

in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:

i_frame_kwargs: dict, optional: DEPRECATED. Use height and width instead.
dash_kwargsdict, optional: Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
run_server_kwargsdict, optional: Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
heightint, str or None, optional: The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
widthint, str or None, optional: The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

class rubicon_ml.viz.ExperimentsTable(experiments=None, is_selectable=True, metric_names=None, metric_query_tags=None, metric_query_type=None, parameter_names=None, parameter_query_tags=None, parameter_query_type=None)¶

Visualize the experiments experiments and their metadata, metrics, and parameters in a tabular format.

Parameters:

experimentslist of rubicon_ml.client.experiment.Experiment, optional: The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
is_selectablebool, optional: True to enable selection of the rows in the table, False otherwise. Defaults to True.
metric_nameslist of str: If provided, only show the metrics with names in the given list. If metric_query_tags are also provided, this will only select metrics from the tag-filtered results.
metric_query_tagslist of str, optional: If provided, only show the metrics with the given tags in the table.
metric_query_type‘and’ or ‘or’, optional: When metric_query_tags are given, ‘and’ shows the metrics with all of the given tags and ‘or’ shows the metrics with any of the given tags.
parameter_nameslist of str: If provided, only show the parameters with names in the given list. If parameter_query_tags are also provided, this will only select parameters from the tag-filtered results.
parameter_query_tagslist of str, optional: If provided, only show the parameters with the given tags in the table.
parameter_query_type‘and’ or ‘or’, optional: When parameter_query_tags are given, ‘and’ shows the paramters with all of the given tags and ‘or’ shows the parameters with any of the given tags.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶

Serve the Dash app on the next available port to render the visualization.

Parameters:

in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:

i_frame_kwargs: dict, optional: DEPRECATED. Use height and width instead.
dash_kwargsdict, optional: Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
run_server_kwargsdict, optional: Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
heightint, str or None, optional: The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
widthint, str or None, optional: The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

class rubicon_ml.viz.MetricCorrelationPlot(experiments=None, metric_names=None, parameter_names=None, selected_metric=None)¶

Visualize the correlation between the parameters and metrics logged to the experiments experiments using a parallel coordinates plot.

More info on parallel coordinates plots can be found here: https://plotly.com/python/parallel-coordinates-plot/

Parameters:

experimentslist of rubicon_ml.client.experiment.Experiment, optional: The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
metric_nameslist of str: The names of the metrics to load. Defaults to None, which loads all metrics logged to the experiments experiments.
parameter_nameslist of str: The names of the parameters to load. Defaults to None, which loads all parameters logged to the experiments experiments.
selected_metricstr: The name of the metric to display at launch. Defaults to None, which selects the metric loaded first.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶

Serve the Dash app on the next available port to render the visualization.

Parameters:

in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:

i_frame_kwargs: dict, optional: DEPRECATED. Use height and width instead.
dash_kwargsdict, optional: Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
run_server_kwargsdict, optional: Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
heightint, str or None, optional: The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
widthint, str or None, optional: The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

class rubicon_ml.viz.MetricListsComparison(column_names=None, experiments=None, selected_metric=None)¶

Visualize lists of metrics logged to the experiments experiments as an annotated heatmap.

More info on annotated heatmaps can be found here: https://plotly.com/python/annotated-heatmap/

Parameters:

column_nameslist of str: Titles to use for each column in the heatmap. Defaults to None.
experimentslist of rubicon_ml.client.experiment.Experiment, optional: The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
selected_metricstr: The name of the metric to display at launch. Defaults to None, which selects the metric loaded first.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶

Serve the Dash app on the next available port to render the visualization.

Parameters:

in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:

i_frame_kwargs: dict, optional: DEPRECATED. Use height and width instead.
dash_kwargsdict, optional: Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
run_server_kwargsdict, optional: Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
heightint, str or None, optional: The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
widthint, str or None, optional: The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

Widgets can be combined into an interactive dashboard.

class rubicon_ml.viz.Dashboard(experiments, widgets=None, link_experiment_table=True)¶

Compose visualizations into a dashboard to view multiple widgets at once.

Parameters:

experimentslist of rubicon_ml.client.experiment.Experiment: The experiments to visualize.
widgetslist of lists of superclasses of rubicon_ml.viz.base.VizBase, optional: The widgets to compose in this dashboard. The widgets should be instantiated without experiments prior to passing as an argument to Dashboard. Defaults to a stacked layout of an ExperimentsTable and a MetricCorrelationPlot.
link_experiment_tablebool, optional: True to enable the callbacks that allow instances of ExperimentsTable to update the experiment inputs of the other widgets in this dashboard. False otherwise. Defaults to True.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶

Serve the Dash app on the next available port to render the visualization.

Parameters:

in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:

i_frame_kwargs: dict, optional: DEPRECATED. Use height and width instead.
dash_kwargsdict, optional: Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
run_server_kwargsdict, optional: Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
heightint, str or None, optional: The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
widthint, str or None, optional: The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

workflow.prefect¶

rubicon_ml contains wrappers for the workflow management engine Prefect. These tasks represent a Prefect-ified rubicon_ml client.

rubicon_ml.workflow.prefect.create_experiment_task(project, **kwargs)¶

Create an experiment within a prefect flow.

This prefect task can be used within a flow to create a new experiment under an existing project.

Parameters:

projectrubicon.client.Project: The project under which the experiment will be created.
kwargsdict: Keyword arguments to be passed to Project.log_experiment.

Returns:

rubicon.client.Experiment: The created experiment.

rubicon_ml.workflow.prefect.get_or_create_project_task(persistence, root_dir, project_name, auto_git_enabled=False, storage_options={}, **kwargs)¶

Get or create a project within a prefect flow.

This prefect task can be used within a flow to create a new project or get an existing one. It should be the entry point to any prefect flow that logs data to Rubicon.

Parameters:

persistencestr: The persistence type to be passed to the Rubicon constructor.
root_dirstr: The root directory to be passed to the Rubicon constructor.
project_namestr: The name of the project to get or create.
auto_git_enabledbool, optional: True to use the git command to automatically log relevant repository information to projects and experiments logged with the client instance created in this task, False otherwise. Defaults to False.
storage_optionsdict, optional: Additional keyword arguments specific to the protocol being chosen. They are passed directly to the underlying filesystem class.
kwargsdict: Additional keyword arguments to be passed to Rubicon.create_project.

Returns:

rubicon.client.Project: The project with name project_name.

rubicon_ml.workflow.prefect.log_artifact_task(parent, **kwargs)¶

Log an artifact within a prefect flow.

This prefect task can be used within a flow to log an artifact to an existing project or experiment.

Parameters:

parentrubicon.client.Project or rubicon.client.Experiment: The project or experiment to log the artifact to.
kwargsdict: Keyword arguments to be passed to Project.log_artifact or Experiment.log_artifact.

Returns:

rubicon.client.Artifact: The logged artifact.

rubicon_ml.workflow.prefect.log_dataframe_task(parent, df, **kwargs)¶

Log a dataframe within a prefect flow.

This prefect task can be used within a flow to log a dataframe to an existing project or experiment.

Parameters:

parentrubicon.client.Project or rubicon.client.Experiment: The project or experiment to log the dataframe to.
dfpandas.DataFrame or dask.dataframe.DataFrame: The pandas or dask dataframe to log.
kwargsdict: Additional keyword arguments to be passed to Project.log_dataframe or Experiment.log_dataframe.

Returns:

rubicon.client.Dataframe: The logged dataframe.

rubicon_ml.workflow.prefect.log_feature_task(experiment, feature_name, **kwargs)¶

Log a feature within a prefect flow.

This prefect task can be used within a flow to log a feature to an existing experiment.

Parameters:

experimentrubicon.client.Experiment: The experiment to log a new feature to.
feature_namestr: The name of the feature to log. Passed to Experiment.log_feature as name.
kwargsdict: Additional keyword arguments to be passed to Experiment.log_feature.

Returns:

rubicon.client.Feature: The logged feature.

rubicon_ml.workflow.prefect.log_metric_task(experiment, metric_name, metric_value, **kwargs)¶

Log a metric within a prefect flow.

This prefect task can be used within a flow to log a metric to an existing experiment.

Parameters:

experimentrubicon.client.Experiment: The experiment to log a new metric to.
metric_namestr: The name of the metric to log. Passed to Experiment.log_metric as name.
metric_valuestr: The value of the metric to log. Passed to Experiment.log_metric as value.
kwargsdict: Additional keyword arguments to be passed to Experiment.log_metric.

Returns:

rubicon.client.Metric: The logged metric.

rubicon_ml.workflow.prefect.log_parameter_task(experiment, parameter_name, parameter_value, **kwargs)¶

Log a parameter within a prefect flow.

This prefect task can be used within a flow to log a parameter to an existing experiment.

Parameters:

experimentrubicon.client.Experiment: The experiment to log a new parameter to.
parameter_namestr: The name of the parameter to log. Passed to Experiment.log_parameter as name.
parameter_valuestr: The value of the parameter to log. Passed to Experiment.log_parameter as value.
kwargsdict: Additional keyword arguments to be passed to Experiment.log_parameter.

Returns:

rubicon.client.Parameter: The logged parameter.