API Reference

Rubicon

class rubicon_ml.Rubicon(persistence: str | None = 'filesystem', root_dir: str | None = None, auto_git_enabled: bool = False, composite_config: List[Dict[str, Any]] | None = None, **storage_options)

The rubicon client’s entry point.

Creates a Config and injects it into the client level objects at run-time.

Parameters:
persistencestr, optional

The persistence type. Can be one of [“filesystem”, “memory”]. Defaults to “filesystem”.

root_dirstr, optional

Absolute or relative filepath. Use absolute path for best performance. Defaults to the local filesystem. Prefix with s3:// to use s3 instead.

auto_git_enabledbool, optional

True to use the git command to automatically log relevant repository information to projects and experiments logged with this client instance, False otherwise. Defaults to False.

storage_optionsdict, optional

Additional keyword arguments specific to the protocol being chosen. They are passed directly to the underlying filesystem class.

property config

Returns a single config.

Exists to promote backwards compatibility.

Returns:
Config

A single Config

create_project(name: str, description: str | None = None, github_url: str | None = None, training_metadata: List[Tuple] | Tuple | None = None) Project

Create a project.

Parameters:
namestr

The project’s name.

descriptionstr, optional

The project’s description.

github_urlstr, optional

The URL of the GitHub repository associated with this project. If omitted and automatic git logging is enabled, it will be retrieved via git remote.

training_metadatatuple or list of tuples, optional

Metadata associated with the training dataset(s) used across each experiment in this project.

Returns:
rubicon.client.Project

The created project.

get_or_create_project(name: str, **kwargs) Project

Get or create a project.

Parameters:
namestr

The project’s name.

kwargsdict

Additional keyword arguments to be passed to Rubicon.create_project.

Returns:
rubicon.client.Project

The corresponding project.

get_project(name: str | None = None, id: str | None = None) Project

Get a project.

Parameters:
namestr, optional

The name of the project to get.

idstr, optional

The id of the project to get.

Returns:
rubicon.client.Project

The project with name name or id id.

get_project_as_dask_df(name, group_by=None)

DEPRECATED: Available for backwards compatibility.

get_project_as_df(name, df_type='pandas', group_by=None)

Get a dask or pandas dataframe representation of a project.

Parameters:
namestr

The name of the project to get.

df_typestr, optional

The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.

group_bystr or None, optional

How to group the project’s experiments in the returned DataFrame(s). Valid options include [“commit_hash”].

Returns:
pandas.DataFrame or list of pandas.DataFrame or dask.DataFrame or list of dask.DataFrame

If group_by is None, a dask or pandas dataframe holding the project’s data. Otherwise a list of dask or pandas dataframes holding the project’s data grouped by group_by.

is_auto_git_enabled() bool

Check if git is enabled for any of the configs.

projects()

Get a list of available projects.

Returns:
list of rubicon.client.Project

The list of available projects.

sync(project_name: str, s3_root_dir: str, aws_profile: str | None = None, aws_shared_credentials_file: str | None = None)

Sync a local project to S3.

Parameters:
project_namestr

The name of the project to sync.

s3_root_dirstr

The S3 path where the project’s data will be synced.

aws_profilestr

Specifies the name of the AWS CLI profile with the credentials and options to use. Defaults to None, using the AWS default name ‘default’.

aws_shared_credentials_filestr

Specifies the location of the file that the AWS CLI uses to store access keys. Defaults to None, using the AWS default path ‘~/.aws/credentials’.

Notes

Use sync to backup your local project data to S3 as an alternative to direct S3 logging. Leverages the AWS CLI’s aws s3 sync. Ensure that any credentials are set and that any proxies are enabled.

Project

class rubicon_ml.Project(domain: ProjectDomain, config: Config | List[Config] | None = None)

A client project.

A project is a collection of experiments, dataframes, and artifacts identified by a unique name.

Parameters:
domainrubicon.domain.Project

The project domain model.

configrubicon.client.Config

The config, which specifies the underlying repository.

archive(experiments: List[Experiment] | None = None, remote_rubicon: Rubicon | None = None)

Archive the experiments logged to this project.

Parameters:
experimentslist of Experiments, optional

The rubicon.client.Experiment objects to archive. If None all logged experiments are archived.

remote_rubiconrubicon_ml.Rubicon object, optional

The remote Rubicon object with the repository to archive to

Returns:
filepath of newly created archive
artifact(name: str | None = None, id: str | None = None) Artifact

Get an artifact logged to this project by id or name.

Parameters:
idstr

The id of the artifact to get.

namestr

The name of the artifact to get.

Returns:
rubicon.client.Artifact

The artifact logged to this project with id id or name ‘name’.

artifacts(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') List[Artifact]

Get the artifacts logged to this client object.

Parameters:
namestr, optional

The name value to filter results on.

tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:
list of rubicon.client.Artifact

The artifacts previously logged to this client object.

property created_at

Get the time the project was created.

dataframe(name: str | None = None, id: str | None = None) Dataframe

Get the dataframe logged to this client object.

Parameters:
idstr

The id of the dataframe to get.

namestr

The name of the dataframe to get.

Returns
——-
rubicon.client.Dataframe

The dataframe logged to this project with id id or name ‘name’.

dataframes(tags: List[str] | None = None, qtype: str = 'or', recursive: bool = False, name: str | None = None) List[Dataframe]

Get the dataframes logged to this project.

Parameters:
tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

recursivebool, optional

If true, get the dataframes logged to this project’s experiments as well. Defaults to false.

namestr

The name value to filter results on.

Returns:
list of rubicon.client.Dataframe

The dataframes previously logged to this client object.

delete_artifacts(ids: List[str])

Delete the artifacts logged to with client object with ids ids.

Parameters:
idslist of str

The ids of the artifacts to delete.

delete_dataframes(ids: List[str])

Delete the dataframes with ids ids logged to this client object.

Parameters:
idslist of str

The ids of the dataframes to delete.

property description

Get the project’s description.

experiment(id: str | None = None, name: str | None = None) Experiment

Get an experiment logged to this project by id or name.

Parameters:
idstr

The id of the experiment to get.

namestr

The name of the experiment to get.

Returns:
rubicon.client.Experiment

The experiment logged to this project with id id or name ‘name’.

experiments(tags: List[str] | None = None, qtype: str = 'or', name: str | None = None) List[Experiment]

Get the experiments logged to this project.

Parameters:
tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

name:

The name of the experiment(s) to filter results on.

Returns:
list of rubicon.client.Experiment

The experiments previously logged to this project.

experiments_from_archive(remote_rubicon, latest_only: bool | None = False)

Retrieve archived experiments into this project’s experiments folder.

Parameters:
remote_rubiconrubicon_ml.Rubicon object

The remote Rubicon object with the repository containing archived experiments to read in

latest_onlybool, optional

Indicates whether or not experiments should only be read from the latest archive

property github_url

Get the project’s GitHub repository URL.

property id

Get the project’s id.

is_auto_git_enabled() bool

Is git enabled for any of the configs.

log_artifact(data_bytes: bytes | None = None, data_directory: str | None = None, data_file: TextIO | None = None, data_object: Any | None = None, data_path: str | None = None, name: str | None = None, description: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Artifact

Log an artifact to this client object.

Parameters:
data_bytesbytes, optional

The raw bytes to log as an artifact.

data_directorystr, optional

The path to a directory to zip and log as an artifact.

data_fileTextIOWrapper, optional

The open file to log as an artifact.

data_objectpython object, optional

The python object to log as an artifact.

data_pathstr, optional

The absolute or relative local path or S3 path to the data to log as an artifact. S3 paths must be prepended with ‘s3://’.

namestr, optional

The name of the artifact file. Required if data_path is not provided.

descriptionstr, optional

A description of the artifact. Use to provide additional context.

tagslist of str, optional

Values to tag the experiment with. Use tags to organize and filter your artifacts.

commentslist of str, optional

Values to comment the experiment with. Use comments to organize and filter your artifacts.

Returns:
rubicon.client.Artifact

The new artifact.

Notes

Only one of data_bytes, data_file, data_object, and data_path should be provided. If more than one is given, the order of precedence is data_bytes, data_object, data_file, data_path.

Examples

>>> # Log with bytes
>>> experiment.log_artifact(
...     data_bytes=b'hello rubicon!',
...     name="bytes_artifact",
...     description="log artifact from bytes",
... )
>>> # Log zipped directory
>>> experiment.log_artifact(
...     data_directory="./path/to/directory/",
...     name="directory.zip",
...     description="log artifact from zipped directory",
... )
>>> # Log with file
>>> with open('./path/to/artifact.txt', 'rb') as file:
>>>     project.log_artifact(
...         data_file=file,
...         name="file_artifact",
...         description="log artifact from file",
...     )
>>> # Log with file path
>>> experiment.log_artifact(
...     data_path="./path/to/artifact.pkl",
...     description="log artifact from file path",
... )
log_conda_environment(artifact_name: str | None = None) Artifact

Log the conda environment as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:
artifact_namestr, optional

The name of the artifact (the exported conda environment).

Returns:
rubicon.client.Artifact

The new artifact.

Notes

Relies on running with an active conda environment.

log_dataframe(df: pd.DataFrame | 'dd.DataFrame' | 'pl.DataFrame', description: str | None = None, name: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Dataframe

Log a dataframe to this client object.

Parameters:
dfpandas.DataFrame, dask.dataframe.DataFrame, or polars DataFrame

The dataframe to log.

descriptionstr, optional

The dataframe’s description. Use to provide additional context.

tagslist of str

The values to tag the dataframe with.

comments: list of str

The values to comment the dataframe with.

Returns:
rubicon.client.Dataframe

The new dataframe.

log_experiment(name: str | None = None, description: str | None = None, model_name: str | None = None, branch_name: str | None = None, commit_hash: str | None = None, training_metadata: Tuple | List[Tuple] | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Experiment

Log a new experiment to this project.

Parameters:
namestr

The experiment’s name.

descriptionstr, optional

The experiment’s description. Use to provide additional context.

model_namestr, optional

The experiment’s model name. For example, this could be the name of the registered model in Model One.

branch_namestr, optional

The name of the active branch of the git repo this experiment is logged from. If omitted and automatic git logging is enabled, it will be retrieved via git rev-parse.

commit_hashstr, optional

The hash of the last commit to the active branch of the git repo this experiment is logged from. If omitted and automatic git logging is enabled, it will be retrieved via git rev-parse.

training_metadatatuple or list of tuples, optional

Metadata associated with the experiment’s training dataset(s).

tagslist of str, optional

Values to tag the experiment with. Use tags to organize and filter your experiments. For example, tags could be used to differentiate between the type of model or classifier used during the experiment (i.e. linear regression or random forest).

commentslist of str, optional

Values to comment the experiment with.

Returns:
rubicon.client.Experiment

The created experiment.

log_h2o_model(h2o_model, artifact_name: str | None = None, export_cross_validation_predictions: bool = False, use_mojo: bool = False, **log_artifact_kwargs) Artifact

Log an h2o model as an artifact using h2o.save_model.

Parameters:
h2o_modelh2o.model.ModelBase

The h2o model to log as an artifact.

artifact_namestr, optional (default None)

The name of the artifact. Defaults to None, using h2o_model’s class name.

export_cross_validation_predictions: bool, optional (default False)

Passed directly to h2o.save_model.

use_mojo: bool, optional (default False)

Whether to log the model in MOJO format. If False, the model will be logged in binary format.

log_artifact_kwargsdict

Additional kwargs to be passed directly to self.log_artifact.

log_json(json_object: Dict[str, Any], name: str | None = None, description: str | None = None, tags: List[str] | None = None) Artifact

Log a python dictionary to a JSON file.

Parameters:
json_objectDict[str, Any]

A python dictionary capable of being converted to JSON.

nameOptional[str], optional

A name for this JSON file, by default None

descriptionOptional[str], optional

A description for this file, by default None

tagsOptional[List[str]], optional

Any Rubicon tags, by default None

Returns:
Artifact

The new artifact.

log_pip_requirements(artifact_name: str | None = None) Artifact

Log the pip requirements as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:
artifact_namestr, optional

The name of the artifact (the exported pip environment).

Returns:
rubicon.client.Artifact

The new artifact.

log_with_schema(obj: Any, experiment: Experiment | None = None, experiment_kwargs: Dict[str, Any] | None = None) Any

Log an experiment leveraging self.schema_.

log_xgboost_model(xgboost_model: xgb.Booster, artifact_name: str | None = None, **log_artifact_kwargs: Any) Artifact

Log an XGBoost model as a JSON file to this client object.

Please note that we do not currently support logging directly from the SKLearn interface.

Parameters:
xgboost_model: Booster

An xgboost model object in the Booster format

artifact_namestr, optional

The name of the artifact (the exported XGBoost model).

log_artifact_kwargsAny

Additional kwargs to be passed directly to self.log_artifact.

Returns:
rubicon.client.Artifact

The new artifact.

property name

Get the project’s name.

property repositories: List[BaseRepository] | None

Get all repositories.

property repository: BaseRepository | None

Get the repository.

set_schema(schema: Dict[str, Any]) None

Set the schema for this client object.

to_dask_df(group_by: str | None = None)

DEPRECATED: Available for backwards compatibility.

to_df(df_type: str = 'pandas', group_by: str | None = None) pd.DataFrame | Dict[str, pd.DataFrame] | dd.DataFrame | Dict[str, dd.DataFrame]

Loads the project’s data into dask or pandas dataframe(s) sorted by created_at. This includes the experiment details along with parameters and metrics.

Parameters:
df_typestr, optional

The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.

group_bystr or None, optional

How to group the project’s experiments in the returned dataframe(s). Valid options include [“commit_hash”].

Returns:
pandas.DataFrame or dict of pandas.DataFrame or dask.DataFrame or dict of dask.DataFrame

If group_by is None, a dask or pandas dataframe holding the project’s data. Otherwise a dict of dask or pandas dataframes holding the project’s data grouped by group_by.

property training_metadata

Get the project’s training metadata.

Experiment

class rubicon_ml.Experiment(domain: ExperimentDomain, parent: Project)

A client experiment.

An experiment represents a model run and is identified by its ‘created_at’ time. It can have metrics, parameters, features, dataframes, and artifacts logged to it.

An experiment is logged to a project.

Parameters:
domainrubicon.domain.Experiment

The experiment domain model.

parentrubicon.client.Project

The project that the experiment is logged to.

add_child_experiment(experiment: Experiment)

Add tags to denote experiment as a descendent of this experiment.

Parameters:
experimentrubicon_ml.client.Experiment

The experiment to mark as a descendent of this experiment.

Raises:
RubiconException

If experiment and this experiment are not logged to the same project.

add_comments(comments: List[str])

Add comments to this client object.

Parameters:
commentslist of str

The comment values to add.

add_tags(tags: List[str])

Add tags to this client object.

Parameters:
tagslist of str

The tag values to add.

artifact(name: str | None = None, id: str | None = None) Artifact

Get an artifact logged to this project by id or name.

Parameters:
idstr

The id of the artifact to get.

namestr

The name of the artifact to get.

Returns:
rubicon.client.Artifact

The artifact logged to this project with id id or name ‘name’.

artifacts(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') List[Artifact]

Get the artifacts logged to this client object.

Parameters:
namestr, optional

The name value to filter results on.

tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:
list of rubicon.client.Artifact

The artifacts previously logged to this client object.

property branch_name

Get the experiment’s branch name.

property comments: List[str]

Get this client object’s comments.

property commit_hash

Get the experiment’s commit hash.

property created_at

Get the time the experiment was created.

dataframe(name: str | None = None, id: str | None = None) Dataframe

Get the dataframe logged to this client object.

Parameters:
idstr

The id of the dataframe to get.

namestr

The name of the dataframe to get.

Returns
——-
rubicon.client.Dataframe

The dataframe logged to this project with id id or name ‘name’.

dataframes(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') List[Dataframe]

Get the dataframes logged to this client object.

Parameters:
namestr, optional

The name value to filter results on.

tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:
list of rubicon.client.Dataframe

The dataframes previously logged to this client object.

delete_artifacts(ids: List[str])

Delete the artifacts logged to with client object with ids ids.

Parameters:
idslist of str

The ids of the artifacts to delete.

delete_dataframes(ids: List[str])

Delete the dataframes with ids ids logged to this client object.

Parameters:
idslist of str

The ids of the dataframes to delete.

property description

Get the experiment’s description.

feature(name=None, id=None)

Get a feature.

Parameters:
namestr, optional

The name of the feature to get.

idstr, optional

The id of the feature to get.

Returns:
rubicon.client.Feature

The feature with name name or id id.

features(name=None, tags=[], qtype='or')

Get the features logged to this experiment.

Parameters:
namestr, optional

The name value to filter results on.

tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:
list of rubicon.client.Feature

The features previously logged to this experiment.

get_child_experiments() List[Experiment]

Get the experiments that are tagged as children of this experiment.

Returns:
list of rubicon_ml.client.Experiment

The experiments that are tagged as children of this experiment.

get_parent_experiments() List[Experiment]

Get the experiments that are tagged as parents of this experiment.

Returns:
list of rubicon_ml.client.Experiment

The experiments that are tagged as parents of this experiment.

property id

Get the experiment’s id.

is_auto_git_enabled() bool

Is git enabled for any of the configs.

log_artifact(data_bytes: bytes | None = None, data_directory: str | None = None, data_file: TextIO | None = None, data_object: Any | None = None, data_path: str | None = None, name: str | None = None, description: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Artifact

Log an artifact to this client object.

Parameters:
data_bytesbytes, optional

The raw bytes to log as an artifact.

data_directorystr, optional

The path to a directory to zip and log as an artifact.

data_fileTextIOWrapper, optional

The open file to log as an artifact.

data_objectpython object, optional

The python object to log as an artifact.

data_pathstr, optional

The absolute or relative local path or S3 path to the data to log as an artifact. S3 paths must be prepended with ‘s3://’.

namestr, optional

The name of the artifact file. Required if data_path is not provided.

descriptionstr, optional

A description of the artifact. Use to provide additional context.

tagslist of str, optional

Values to tag the experiment with. Use tags to organize and filter your artifacts.

commentslist of str, optional

Values to comment the experiment with. Use comments to organize and filter your artifacts.

Returns:
rubicon.client.Artifact

The new artifact.

Notes

Only one of data_bytes, data_file, data_object, and data_path should be provided. If more than one is given, the order of precedence is data_bytes, data_object, data_file, data_path.

Examples

>>> # Log with bytes
>>> experiment.log_artifact(
...     data_bytes=b'hello rubicon!',
...     name="bytes_artifact",
...     description="log artifact from bytes",
... )
>>> # Log zipped directory
>>> experiment.log_artifact(
...     data_directory="./path/to/directory/",
...     name="directory.zip",
...     description="log artifact from zipped directory",
... )
>>> # Log with file
>>> with open('./path/to/artifact.txt', 'rb') as file:
>>>     project.log_artifact(
...         data_file=file,
...         name="file_artifact",
...         description="log artifact from file",
...     )
>>> # Log with file path
>>> experiment.log_artifact(
...     data_path="./path/to/artifact.pkl",
...     description="log artifact from file path",
... )
log_conda_environment(artifact_name: str | None = None) Artifact

Log the conda environment as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:
artifact_namestr, optional

The name of the artifact (the exported conda environment).

Returns:
rubicon.client.Artifact

The new artifact.

Notes

Relies on running with an active conda environment.

log_dataframe(df: pd.DataFrame | 'dd.DataFrame' | 'pl.DataFrame', description: str | None = None, name: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Dataframe

Log a dataframe to this client object.

Parameters:
dfpandas.DataFrame, dask.dataframe.DataFrame, or polars DataFrame

The dataframe to log.

descriptionstr, optional

The dataframe’s description. Use to provide additional context.

tagslist of str

The values to tag the dataframe with.

comments: list of str

The values to comment the dataframe with.

Returns:
rubicon.client.Dataframe

The new dataframe.

log_feature(name: str, description: str = None, importance: float = None, tags: list[str] = [], comments: list[str] = []) Feature

Create a feature under the experiment.

Parameters:
namestr

The features’s name.

descriptionstr

The feature’s description. Use to provide additional context.

importancefloat

The feature’s importance.

tagslist of str, optional

Values to tag the experiment with. Use tags to organize and filter your features.

commentslist of str, optional

Values to comment the experiment with. Use comments to organize and filter your features.

Returns:
rubicon.client.Feature

The created feature.

log_h2o_model(h2o_model, artifact_name: str | None = None, export_cross_validation_predictions: bool = False, use_mojo: bool = False, **log_artifact_kwargs) Artifact

Log an h2o model as an artifact using h2o.save_model.

Parameters:
h2o_modelh2o.model.ModelBase

The h2o model to log as an artifact.

artifact_namestr, optional (default None)

The name of the artifact. Defaults to None, using h2o_model’s class name.

export_cross_validation_predictions: bool, optional (default False)

Passed directly to h2o.save_model.

use_mojo: bool, optional (default False)

Whether to log the model in MOJO format. If False, the model will be logged in binary format.

log_artifact_kwargsdict

Additional kwargs to be passed directly to self.log_artifact.

log_json(json_object: Dict[str, Any], name: str | None = None, description: str | None = None, tags: List[str] | None = None) Artifact

Log a python dictionary to a JSON file.

Parameters:
json_objectDict[str, Any]

A python dictionary capable of being converted to JSON.

nameOptional[str], optional

A name for this JSON file, by default None

descriptionOptional[str], optional

A description for this file, by default None

tagsOptional[List[str]], optional

Any Rubicon tags, by default None

Returns:
Artifact

The new artifact.

log_metric(name: str, value: float, directionality: str = 'score', description: str = None, tags: list[str] = [], comments: list[str] = []) Metric

Create a metric under the experiment.

Parameters:
namestr

The metric’s name.

valuefloat

The metric’s value.

directionalitystr, optional

The metric’s directionality. Must be one of [“score”, “loss”], where “score” represents a metric to maximize, while “loss” represents a metric to minimize. Defaults to “score”.

descriptionstr, optional

The metric’s description. Use to provide additional context.

tagslist of str, optional

Values to tag the experiment with. Use tags to organize and filter your metrics.

commentslist of str, optional

Values to comment the experiment with. Use comments to organize and filter your metrics.

Returns:
rubicon.client.Metric

The created metric.

log_parameter(name: str, value: object = None, description: str = None, tags: list[str] = [], comments: list[str] = []) Parameter

Create a parameter under the experiment.

Parameters:
namestr

The parameter’s name.

valueobject, optional

The parameter’s value. Can be an object of any JSON serializable (via rubicon.utils.DomainJSONEncoder) type.

descriptionstr, optional

The parameter’s description. Use to provide additional context.

tagslist of str, optional

Values to tag the parameter with. Use tags to organize and filter your parameters.

commentslist of str, optional

Values to comment the experiment with. Use comments to organize and filter your features.

Returns:
rubicon.client.Parameter

The created parameter.

log_pip_requirements(artifact_name: str | None = None) Artifact

Log the pip requirements as an artifact to this client object. Useful for recreating your exact environment at a later date.

Parameters:
artifact_namestr, optional

The name of the artifact (the exported pip environment).

Returns:
rubicon.client.Artifact

The new artifact.

log_xgboost_model(xgboost_model: xgb.Booster, artifact_name: str | None = None, **log_artifact_kwargs: Any) Artifact

Log an XGBoost model as a JSON file to this client object.

Please note that we do not currently support logging directly from the SKLearn interface.

Parameters:
xgboost_model: Booster

An xgboost model object in the Booster format

artifact_namestr, optional

The name of the artifact (the exported XGBoost model).

log_artifact_kwargsAny

Additional kwargs to be passed directly to self.log_artifact.

Returns:
rubicon.client.Artifact

The new artifact.

metric(name=None, id=None)

Get a metric.

Parameters:
namestr, optional

The name of the metric to get.

idstr, optional

The id of the metric to get.

Returns:
rubicon.client.Metric

The metric with name name or id id.

metrics(name=None, tags=[], qtype='or')

Get the metrics logged to this experiment.

Parameters:
namestr, optional

The name value to filter results on.

tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:
list of rubicon.client.Metric

The metrics previously logged to this experiment.

property model_name

Get the experiment’s model name.

property name

Get the experiment’s name.

parameter(name=None, id=None)

Get a parameter.

Parameters:
namestr, optional

The name of the parameter to get.

idstr, optional

The id of the parameter to get.

Returns:
rubicon.client.Parameter

The parameter with name name or id id.

parameters(name=None, tags=[], qtype='or')

Get the parameters logged to this experiment.

Parameters:
namestr, optional

The name value to filter results on.

tagslist of str, optional

The tag values to filter results on.

qtypestr, optional

The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.

Returns:
list of rubicon.client.Parameter

The parameters previously logged to this experiment.

property project

Get the project client object that this experiment belongs to.

remove_comments(comments: List[str])

Remove comments from this client object.

Parameters:
commentslist of str

The comment values to remove.

remove_tags(tags: List[str])

Remove tags from this client object.

Parameters:
tagslist of str

The tag values to remove.

property repositories: List[BaseRepository] | None

Get all repositories.

property repository: BaseRepository | None

Get the repository.

property tags: TagContainer

Get this client object’s tags.

property training_metadata

Get the project’s training metadata.

Parameter

class rubicon_ml.Parameter(domain: ParameterDomain, parent: Experiment)

A client parameter.

A parameter is an input to an experiment (model run) that depends on the type of model being used. It affects the model’s predictions.

For example, if you were using a random forest classifier, ‘n_estimators’ (the number of trees in the forest) could be a parameter.

A parameter is logged to an experiment.

Parameters:
domainrubicon.domain.Parameter

The parameter domain model.

parentrubicon.client.Experiment

The experiment that the parameter is logged to.

add_comments(comments: List[str])

Add comments to this client object.

Parameters:
commentslist of str

The comment values to add.

add_tags(tags: List[str])

Add tags to this client object.

Parameters:
tagslist of str

The tag values to add.

property comments: List[str]

Get this client object’s comments.

property created_at: datetime

Get the time the parameter was created.

property description: str | None

Get the parameter’s description.

property id: str

Get the parameter’s id.

is_auto_git_enabled() bool

Is git enabled for any of the configs.

property name: str | None

Get the parameter’s name.

property parent: Experiment

Get the parameter’s parent client object.

remove_comments(comments: List[str])

Remove comments from this client object.

Parameters:
commentslist of str

The comment values to remove.

remove_tags(tags: List[str])

Remove tags from this client object.

Parameters:
tagslist of str

The tag values to remove.

property repositories: List[BaseRepository] | None

Get all repositories.

property repository: BaseRepository | None

Get the repository.

property tags: TagContainer

Get this client object’s tags.

property value: object | float | None

Get the parameter’s value.

Feature

class rubicon_ml.Feature(domain: FeatureDomain, parent: Experiment)

A client feature.

A feature is an input to an experiment (model run) that’s an independent, measurable property of a phenomenon being observed. It affects the model’s predictions.

For example, consider a model that predicts how likely a customer is to pay back a loan. Possible features could be ‘year’, ‘credit score’, etc.

A feature is logged to an experiment.

Parameters:
domainrubicon.domain.Feature

The feature domain model.

configrubicon.client.Config

The config, which specifies the underlying repository.

parentrubicon.client.Experiment

The experiment that the feature is logged to.

add_comments(comments: List[str])

Add comments to this client object.

Parameters:
commentslist of str

The comment values to add.

add_tags(tags: List[str])

Add tags to this client object.

Parameters:
tagslist of str

The tag values to add.

property comments: List[str]

Get this client object’s comments.

property created_at: datetime

Get the feature’s created_at.

property description: str | None

Get the feature’s description.

property id: str

Get the feature’s id.

property importance

Get the feature’s importance.

is_auto_git_enabled() bool

Is git enabled for any of the configs.

property name: str | None

Get the feature’s name.

property parent: Experiment

Get the feature’s parent client object.

remove_comments(comments: List[str])

Remove comments from this client object.

Parameters:
commentslist of str

The comment values to remove.

remove_tags(tags: List[str])

Remove tags from this client object.

Parameters:
tagslist of str

The tag values to remove.

property repositories: List[BaseRepository] | None

Get all repositories.

property repository: BaseRepository | None

Get the repository.

property tags: TagContainer

Get this client object’s tags.

Metric

class rubicon_ml.Metric(domain: MetricDomain, parent: Experiment)

A client metric.

A metric is a single-value output of an experiment that helps evaluate the quality of the model’s predictions.

It can be either a ‘score’ (value to maximize) or a ‘loss’ (value to minimize).

A metric is logged to an experiment.

Parameters:
domainrubicon.domain.Metric

The metric domain model.

parentrubicon.client.Experiment

The experiment that the metric is logged to.

add_comments(comments: List[str])

Add comments to this client object.

Parameters:
commentslist of str

The comment values to add.

add_tags(tags: List[str])

Add tags to this client object.

Parameters:
tagslist of str

The tag values to add.

property comments: List[str]

Get this client object’s comments.

property created_at: datetime

Get the metric’s created_at.

property description: str | None

Get the metric’s description.

property directionality: str

Get the metric’s directionality.

property id: str

Get the metric’s id.

is_auto_git_enabled() bool

Is git enabled for any of the configs.

property name: str | None

Get the metric’s name.

property parent: Experiment

Get the metric’s parent client object.

remove_comments(comments: List[str])

Remove comments from this client object.

Parameters:
commentslist of str

The comment values to remove.

remove_tags(tags: List[str])

Remove tags from this client object.

Parameters:
tagslist of str

The tag values to remove.

property repositories: List[BaseRepository] | None

Get all repositories.

property repository: BaseRepository | None

Get the repository.

property tags: TagContainer

Get this client object’s tags.

property value

Get the metric’s value.

Dataframe

class rubicon_ml.Dataframe(domain: DataframeDomain, parent: Experiment | Project)

A client dataframe.

A dataframe is a two-dimensional, tabular dataset with labeled axes (rows and columns) that provides value to the model developer and/or reviewer when visualized.

For example, confusion matrices, feature importance tables and marginal residuals can all be logged as a dataframe.

A dataframe is logged to a project or an experiment.

Parameters:
domainrubicon.domain.Dataframe

The dataframe domain model.

parentrubicon.client.Project or rubicon.client.Experiment

The project or experiment that the artifact is logged to.

add_comments(comments: List[str])

Add comments to this client object.

Parameters:
commentslist of str

The comment values to add.

add_tags(tags: List[str])

Add tags to this client object.

Parameters:
tagslist of str

The tag values to add.

property comments: List[str]

Get this client object’s comments.

property created_at

Get the time this dataframe was created.

property description

Get the dataframe’s description.

get_data(df_type: Literal['pandas', 'dask'] = 'pandas')

Loads the data associated with this Dataframe into a pandas or dask dataframe.

Parameters:
df_typestr, optional

The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.

property id

Get the dataframe’s id.

is_auto_git_enabled() bool

Is git enabled for any of the configs.

property name

Get the dataframe’s name.

property parent

Get the dataframe’s parent client object.

plot(df_type: Literal['pandas', 'dask'] = 'pandas', plotting_func: Callable | None = None, **kwargs)

Render the dataframe using plotly.express.

Parameters:
df_typestr, optional

The type of dataframe. Can be either pandas or dask. Defaults to ‘pandas’.

plotting_funcfunction, optional

The plotly.express plotting function used to visualize the dataframes. Available options can be found at https://plotly.com/python-api-reference/plotly.express.html. Defaults to plotly.express.line.

kwargsdict, optional

Keyword arguments to be passed to plotting_func. Available options can be found in the documentation of the individual functions at the URL above.

Examples

>>> # Log a line plot
>>> dataframe.plot(x='Year', y='Number of Subscriptions')
>>> # Log a bar plot
>>> import plotly.express as px
>>> dataframe.plot(plotting_func=px.bar, x='Year', y='Number of Subscriptions')
remove_comments(comments: List[str])

Remove comments from this client object.

Parameters:
commentslist of str

The comment values to remove.

remove_tags(tags: List[str])

Remove tags from this client object.

Parameters:
tagslist of str

The tag values to remove.

property repositories: List[BaseRepository] | None

Get all repositories.

property repository: BaseRepository | None

Get the repository.

property tags: TagContainer

Get this client object’s tags.

Artifact

class rubicon_ml.Artifact(domain: ArtifactDomain, parent: Project)

A client artifact.

An artifact is a catch-all for any other type of data that can be logged to a file.

For example, a snapshot of a trained model (.pkl) can be logged to the experiment created during its run. Or, a base model for the model in development can be logged to a project when leveraging transfer learning.

An artifact is logged to a project or an experiment.

Parameters:
domainrubicon.domain.Artifact

The artifact domain model.

parentrubicon.client.Project or rubicon.client.Experiment

The project or experiment that the artifact is logged to.

add_comments(comments: List[str])

Add comments to this client object.

Parameters:
commentslist of str

The comment values to add.

add_tags(tags: List[str])

Add tags to this client object.

Parameters:
tagslist of str

The tag values to add.

property comments: List[str]

Get this client object’s comments.

property created_at

Get the time this dataframe was created.

property data

Get the artifact’s raw data.

property description: str

Get the artifact’s description.

download(location: str | None = None, name: str | None = None, unzip: bool = False)

Download this artifact’s data.

Parameters:
locationstr, optional

The absolute or relative local directory or S3 bucket to download the artifact to. S3 buckets must be prepended with ‘s3://’. Defaults to the current local working directory.

namestr, optional

The name to give the downloaded artifact file. Defaults to the artifact’s given name when logged.

unzipbool, optional

True to unzip the artifact data. False otherwise. Defaults to False.

get_data(deserialize: Literal['h2o', 'h2o_binary', 'h2o_mojo', 'pickle', 'xgboost'] | None = None, unpickle: bool = False)

Loads the data associated with this artifact and unpickles if needed.

Parameters:
deseralizestr, optional

Method to use to deseralize this artifact’s data. * None to disable deseralization and return the raw data. * “h2o” or “h2o_binary” to use h2o.load_model to load the data. * “h2o_mojo” to use h2o.import_mojo to load the data. * “pickle” to use pickles to load the data. * “xgboost” to use xgboost’s JSON loader to load the data as a fitted model. Defaults to None.

unpicklebool, optional

Flag indicating whether or not to unpickle artifact data. deserialize takes precedence. Defaults to False. Deprecated: Please use deserialize=”pickle” in the future.

property id: str

Get the artifact’s id.

is_auto_git_enabled() bool

Is git enabled for any of the configs.

property name: str

Get the artifact’s name.

property parent

Get the artifact’s parent client object.

remove_comments(comments: List[str])

Remove comments from this client object.

Parameters:
commentslist of str

The comment values to remove.

remove_tags(tags: List[str])

Remove tags from this client object.

Parameters:
tagslist of str

The tag values to remove.

property repositories: List[BaseRepository] | None

Get all repositories.

property repository: BaseRepository | None

Get the repository.

property tags: TagContainer

Get this client object’s tags.

temporary_download(unzip: bool = False)

Temporarily download this artifact’s data within a context manager.

Parameters:
unzipbool, optional

True to unzip the artifact data. False otherwise. Defaults to False.

Yields:
file

An open file pointer into the directory the artifact data was temporarily downloaded into. If the artifact is a single file, its name is stored in the artifact.name attribute.

exception_handling

rubicon_ml.set_failure_mode(failure_mode: str, traceback_chain: bool = False, traceback_limit: int | None = None) None

Set the failure mode.

Parameters:
failure_modestr

The name of the failure mode to set. “raise” to raise all exceptions, “log” to catch all exceptions and log them via logging.error, “warn” to catch all exceptions and re-raise them as warnings via warnings.warn. Defaults to “raise”.

traceback_chainbool, optional

True to display each error in the traceback chain when logging or warning, False to display only the first. Defaults to False.

traceback_limitint, optional

The depth of the traceback displayed when logging or warning. 0 to display only the error’s text, each increment shows another line of the traceback.

publish

rubicon_ml leverages intake to easily share sets of experiments.

rubicon_ml.publish(experiments, visualization_object: ExperimentsTable | MetricCorrelationPlot | DataframePlot | MetricListsComparison | None = None, output_filepath=None, base_catalog_filepath=None)

Publish experiments to an intake catalog that can be read by the intake-rubicon driver.

Parameters:
experimentslist of rubicon_ml.client.experiment.Experiment

The experiments to publish.

output_filepathstr, optional

The absolute or relative local filepath or S3 bucket and key to log the generated YAML file to. S3 buckets must be prepended with ‘s3://’. Defaults to None, which disables writing the generated YAML.

base_catalog_filepathstr, optional

Similar to output_filepath except this argument is used as a base base file to update an existing intake catalog. Defaults to None, creating a new intake catalog.

Returns:
str

The YAML string representation of the intake catalog containing the experiments experiments.

RubiconJSON

class rubicon_ml.RubiconJSON(rubicon_objects: List[Rubicon] | None = None, projects: List[Project] | None = None, experiments: List[Experiment] | None = None)

RubiconJSON converts top-level rubicon_ml objects, projects, and experiments into a JSON structured dictionary for JSONPath-like querying with jsonpath-ng.

Parameters:
rubicon_objectsrubicon.client.Rubicon or list of type rubicon.client.Rubicon

Top-level rubicon-ml objects to convert to JSON for querying.

projectsrubicon.client.Project or list of type rubicon.client.Project

rubicon-ml projects to convert to JSON for querying.

experimentsrubicon.client.Experiment or list of type rubicon.client.Experiment

rubicon-ml experiments to convert to JSON for querying.

search(query: str)

Query the JSON generated from the RubiconJSON instantiation in a JSONPath-like manner. Can return queries as rubicon_ml.client objects by specifying return_type parameter. Will return as JSON structured dict by default.

Parameters:
query: JSONPath-like query

schema

Methods and a mixin to enable schema logging.

The functions available in the schema submodule are applied to rubicon_ml.Project via the SchemaMixin class. They can be called directly as a method of an existing project.

class rubicon_ml.schema.logger.SchemaMixin

Adds schema logging support to a client object.

log_with_schema(obj: Any, experiment: Experiment | None = None, experiment_kwargs: Dict[str, Any] | None = None) Any

Log an experiment leveraging self.schema_.

set_schema(schema: Dict[str, Any]) None

Set the schema for this client object.

Mehtods for interacting with the existing rubicon-ml schema.

rubicon_ml.schema.registry.available_schema() List[str]

Get the names of all available schema.

rubicon_ml.schema.registry.get_schema(name: str) Any

Get the schema with name name.

rubicon_ml.schema.registry.get_schema_name(obj: Any) str

Get the name of the schema that represents object obj.

rubicon_ml.schema.registry.register_schema(name: str, schema: dict)

Add a schema to the schema registry.

sklearn

rubicon_ml offers direct integration with Scikit-learn via our own pipeline object.

class rubicon_ml.sklearn.RubiconPipeline(project, steps, user_defined_loggers={}, experiment_kwargs={'name': 'RubiconPipeline experiment'}, memory=None, verbose=False, ignore_warnings=False)

An extension of sklearn.pipeline.Pipeline that automatically creates a Rubicon experiment under the provided project and logs the pipeline’s parameters and metrics to it.

A single pipeline run will result in a single experiment logged with its corresponding parameters and metrics pulled from the pipeline’s estimators.

Parameters:
projectrubicon_ml.client.Project

The rubicon project to log to.

stepslist

List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.

user_defined_loggersdict, optional

A dict mapping the estimator name to a corresponding user defined logger. See the example below for more details.

experiment_kwargsdict, optional

Additional keyword arguments to be passed to project.log_experiment().

memorystr or object with the joblib.Memory interface, default=None

Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. (docstring source: Scikit-Learn)

verbosebool, default=False

If True, the time elapsed while fitting each step will be printed as it is completed. (docstring source: Scikit-Learn)

ignore_warningsbool, default=False

If True, ignores warnings thrown by pipeline.

Examples

>>> pipeline = RubiconPipeline(
...     project,
...     [
...         ("vect", CountVectorizer()),
...         ("tfidf", TfidfTransformer()),
...         ("clf", SGDClassifier()),
...     ],
...     user_defined_loggers = {
...         "vect": FilterEstimatorLogger(
...             select=["input", "decode_error", "max_df"],
...         ),
...         "tfidf": FilterEstimatorLogger(ignore_all=True),
...         "clf": FilterEstimatorLogger(
...             ignore=["alpha", "penalty"],
...         ),
...     }
... )
fit(X, y=None, tags=None, log_fit_params=True, experiment=None, **fit_params)

Fit the model and automatically log the fit_params to rubicon-ml. Optionally, pass tags to update the experiment’s tags.

Parameters:
Xiterable

Training data. Must fulfill input requirements of first step of the pipeline.

yiterable, optional

Training targets. Must fulfill label requirements for all steps of the pipeline.

tagslist, optional

Additional tags to add to the experiment during the fit.

log_fit_paramsbool, optional

True to log the values passed as fit_params to this pipeline’s experiment. Defaults to True.

fit_paramsdict, optional

Additional keyword arguments to be passed to sklearn.pipeline.Pipeline.fit().

experiment: rubicon_ml.experiment.client.Experiment, optional

The experiment to log the to. If no experiment is provided the metrics are logged to a new experiment with self.experiment_kwargs.

Returns:
rubicon_ml.sklearn.Pipeline

This RubiconPipeline.

get_estimator_logger(step_name=None, estimator=None)

Get a logger for the estimator. By default, the logger will have the current experiment set.

score(X, y=None, sample_weight=None, experiment=None)

Score with the final estimator and automatically log the results to rubicon-ml.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

yiterable, optional

Targets used for scoring. Must fulfill label requirements for all steps of the pipeline.

sample_weightlist, optional

If not None, this argument is passed as sample_weight keyword argument to the score method of the final estimator.

experiment: rubicon_ml.experiment.client.Experiment, optional

The experiment to log the score to. If no experiment is provided the score is logged to a new experiment with self.experiment_kwargs.

Returns:
float

Result of calling score on the final estimator.

score_samples(X, experiment=None)

Score samples with the final estimator and automatically log the results to rubicon-ml.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

experiment: rubicon_ml.experiment.client.Experiment, optional

The experiment to log the score to. If no experiment is provided the score is logged to a new experiment with self.experiment_kwargs.

Returns:
ndarray of shape (n_samples,)

Result of calling score_samples on the final estimator.

set_fit_request(*, experiment: bool | None | str = '$UNCHANGED$', log_fit_params: bool | None | str = '$UNCHANGED$', tags: bool | None | str = '$UNCHANGED$') RubiconPipeline

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
experimentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for experiment parameter in fit.

log_fit_paramsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for log_fit_params parameter in fit.

tagsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for tags parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, experiment: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') RubiconPipeline

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
experimentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for experiment parameter in score.

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class rubicon_ml.sklearn.FilterEstimatorLogger(estimator=None, experiment=None, step_name=None, select=[], ignore=[], ignore_all=False)

The filter logger for sklearn estimators. Use this logger to either select or ignore specific parameters for logging.

Parameters:
estimatora sklearn estimator, optional

The estimator

experimentrubicon.client.Experiment, optional

The experiment to log the parameters and metrics to.

step_namestr, optional

The name of the pipeline step.

selectlist, optional

The list of parameters on this estimator that you’d like to log. All other parameters will be ignored.

ignorelist, optional

The list of parameters on this estimator that you’d like to ignore by not logging. The other parameters will be logged.

ignore_allbool, optional

Ignore all parameters if true.

rubicon_ml.sklearn.pipeline.make_pipeline(project, *steps, experiment_kwargs={'name': 'RubiconPipeline experiment'}, memory=None, verbose=False)

Wrapper around RubicionPipeline(). Does not require naming for estimators. Their names are set to the lowercase strings of their types.

Parameters:
projectrubicon_ml.client.Project

The rubicon project to log to.

stepslist

List of estimator objects or (estimator, logger) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator. (docstring source: Scikit-Learn)

experiment_kwargsdict, optional

Additional keyword arguments to be passed to project.log_experiment().

memorystr or object with the joblib.Memory interface, default=None

Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. (docstring source: Scikit-Learn)

verbosebool, default=False

If True, the time elapsed while fitting each step will be printed as it is completed. (docstring source: Scikit-Learn)

Returns:
rubicon_ml.sklearn.Pipeline

A RubiconPipeline with project project and steps steps.

viz

rubicon_ml offers visualization leveraging Dash and Plotly. Each of the following classes are standalone widgets.

class rubicon_ml.viz.DataframePlot(dataframe_name, experiments=None, plotting_func=<function line>, plotting_func_kwargs={}, x=None, y=None)

Plot the dataframes with name dataframe_name logged to the experiments experiments on a shared axis.

Parameters:
dataframe_namestr

The name of the dataframe to plot. A dataframe with name dataframe_name must be logged to each experiment in experiments.

experimentslist of rubicon_ml.client.experiment.Experiment, optional

The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.

plotting_funcfunction, optional

The plotly.express plotting function used to visualize the dataframes. Available options can be found at https://plotly.com/python-api-reference/plotly.express.html. Defaults to plotly.express.line.

plotting_func_kwargsdict, optional

Keyword arguments to be passed to plotting_func. Available options can be found in the documentation of the individual functions at the URL above.

xstr, optional

The name of the column in the dataframes with name dataframe_name to plot across the x-axis.

ystr, optional

The name of the column in the dataframes with name dataframe_name to plot across the y-axis.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})

Serve the Dash app on the next available port to render the visualization.

Parameters:
in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

  • “jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.

  • “tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:
i_frame_kwargs: dict, optional

DEPRECATED. Use height and width instead.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

heightint, str or None, optional

The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

widthint, str or None, optional

The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

class rubicon_ml.viz.ExperimentsTable(experiments=None, is_selectable=True, metric_names=None, metric_query_tags=None, metric_query_type=None, parameter_names=None, parameter_query_tags=None, parameter_query_type=None)

Visualize the experiments experiments and their metadata, metrics, and parameters in a tabular format.

Parameters:
experimentslist of rubicon_ml.client.experiment.Experiment, optional

The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.

is_selectablebool, optional

True to enable selection of the rows in the table, False otherwise. Defaults to True.

metric_nameslist of str

If provided, only show the metrics with names in the given list. If metric_query_tags are also provided, this will only select metrics from the tag-filtered results.

metric_query_tagslist of str, optional

If provided, only show the metrics with the given tags in the table.

metric_query_type‘and’ or ‘or’, optional

When metric_query_tags are given, ‘and’ shows the metrics with all of the given tags and ‘or’ shows the metrics with any of the given tags.

parameter_nameslist of str

If provided, only show the parameters with names in the given list. If parameter_query_tags are also provided, this will only select parameters from the tag-filtered results.

parameter_query_tagslist of str, optional

If provided, only show the parameters with the given tags in the table.

parameter_query_type‘and’ or ‘or’, optional

When parameter_query_tags are given, ‘and’ shows the paramters with all of the given tags and ‘or’ shows the parameters with any of the given tags.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})

Serve the Dash app on the next available port to render the visualization.

Parameters:
in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

  • “jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.

  • “tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:
i_frame_kwargs: dict, optional

DEPRECATED. Use height and width instead.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

heightint, str or None, optional

The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

widthint, str or None, optional

The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

class rubicon_ml.viz.MetricCorrelationPlot(experiments=None, metric_names=None, parameter_names=None, selected_metric=None)

Visualize the correlation between the parameters and metrics logged to the experiments experiments using a parallel coordinates plot.

More info on parallel coordinates plots can be found here: https://plotly.com/python/parallel-coordinates-plot/

Parameters:
experimentslist of rubicon_ml.client.experiment.Experiment, optional

The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.

metric_nameslist of str

The names of the metrics to load. Defaults to None, which loads all metrics logged to the experiments experiments.

parameter_nameslist of str

The names of the parameters to load. Defaults to None, which loads all parameters logged to the experiments experiments.

selected_metricstr

The name of the metric to display at launch. Defaults to None, which selects the metric loaded first.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})

Serve the Dash app on the next available port to render the visualization.

Parameters:
in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

  • “jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.

  • “tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:
i_frame_kwargs: dict, optional

DEPRECATED. Use height and width instead.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

heightint, str or None, optional

The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

widthint, str or None, optional

The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

class rubicon_ml.viz.MetricListsComparison(column_names=None, experiments=None, selected_metric=None)

Visualize lists of metrics logged to the experiments experiments as an annotated heatmap.

More info on annotated heatmaps can be found here: https://plotly.com/python/annotated-heatmap/

Parameters:
column_nameslist of str

Titles to use for each column in the heatmap. Defaults to None.

experimentslist of rubicon_ml.client.experiment.Experiment, optional

The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.

selected_metricstr

The name of the metric to display at launch. Defaults to None, which selects the metric loaded first.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})

Serve the Dash app on the next available port to render the visualization.

Parameters:
in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

  • “jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.

  • “tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:
i_frame_kwargs: dict, optional

DEPRECATED. Use height and width instead.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

heightint, str or None, optional

The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

widthint, str or None, optional

The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

Widgets can be combined into an interactive dashboard.

class rubicon_ml.viz.Dashboard(experiments, widgets=None, link_experiment_table=True)

Compose visualizations into a dashboard to view multiple widgets at once.

Parameters:
experimentslist of rubicon_ml.client.experiment.Experiment

The experiments to visualize.

widgetslist of lists of superclasses of rubicon_ml.viz.base.VizBase, optional

The widgets to compose in this dashboard. The widgets should be instantiated without experiments prior to passing as an argument to Dashboard. Defaults to a stacked layout of an ExperimentsTable and a MetricCorrelationPlot.

link_experiment_tablebool, optional

True to enable the callbacks that allow instances of ExperimentsTable to update the experiment inputs of the other widgets in this dashboard. False otherwise. Defaults to True.

serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})

Serve the Dash app on the next available port to render the visualization.

Parameters:
in_backgroundbool, optional

DEPRECATED. Background processing is now handled by jupyter_mode.

jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional

How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output

cell.

  • “jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.

  • “tab” to serve the dashboard at an external link and open a new browser tab to said link.

Defaults to “external”.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)

Serve the Dash app on the next available port to render the visualization.

Additionally, renders the visualization inline in the current Jupyter notebook.

Parameters:
i_frame_kwargs: dict, optional

DEPRECATED. Use height and width instead.

dash_kwargsdict, optional

Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.

run_server_kwargsdict, optional

Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.

heightint, str or None, optional

The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

widthint, str or None, optional

The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.

workflow.prefect

rubicon_ml contains wrappers for the workflow management engine Prefect. These tasks represent a Prefect-ified rubicon_ml client.

rubicon_ml.workflow.prefect.create_experiment_task(project, **kwargs)

Create an experiment within a prefect flow.

This prefect task can be used within a flow to create a new experiment under an existing project.

Parameters:
projectrubicon.client.Project

The project under which the experiment will be created.

kwargsdict

Keyword arguments to be passed to Project.log_experiment.

Returns:
rubicon.client.Experiment

The created experiment.

rubicon_ml.workflow.prefect.get_or_create_project_task(persistence, root_dir, project_name, auto_git_enabled=False, storage_options={}, **kwargs)

Get or create a project within a prefect flow.

This prefect task can be used within a flow to create a new project or get an existing one. It should be the entry point to any prefect flow that logs data to Rubicon.

Parameters:
persistencestr

The persistence type to be passed to the Rubicon constructor.

root_dirstr

The root directory to be passed to the Rubicon constructor.

project_namestr

The name of the project to get or create.

auto_git_enabledbool, optional

True to use the git command to automatically log relevant repository information to projects and experiments logged with the client instance created in this task, False otherwise. Defaults to False.

storage_optionsdict, optional

Additional keyword arguments specific to the protocol being chosen. They are passed directly to the underlying filesystem class.

kwargsdict

Additional keyword arguments to be passed to Rubicon.create_project.

Returns:
rubicon.client.Project

The project with name project_name.

rubicon_ml.workflow.prefect.log_artifact_task(parent, **kwargs)

Log an artifact within a prefect flow.

This prefect task can be used within a flow to log an artifact to an existing project or experiment.

Parameters:
parentrubicon.client.Project or rubicon.client.Experiment

The project or experiment to log the artifact to.

kwargsdict

Keyword arguments to be passed to Project.log_artifact or Experiment.log_artifact.

Returns:
rubicon.client.Artifact

The logged artifact.

rubicon_ml.workflow.prefect.log_dataframe_task(parent, df, **kwargs)

Log a dataframe within a prefect flow.

This prefect task can be used within a flow to log a dataframe to an existing project or experiment.

Parameters:
parentrubicon.client.Project or rubicon.client.Experiment

The project or experiment to log the dataframe to.

dfpandas.DataFrame or dask.dataframe.DataFrame

The pandas or dask dataframe to log.

kwargsdict

Additional keyword arguments to be passed to Project.log_dataframe or Experiment.log_dataframe.

Returns:
rubicon.client.Dataframe

The logged dataframe.

rubicon_ml.workflow.prefect.log_feature_task(experiment, feature_name, **kwargs)

Log a feature within a prefect flow.

This prefect task can be used within a flow to log a feature to an existing experiment.

Parameters:
experimentrubicon.client.Experiment

The experiment to log a new feature to.

feature_namestr

The name of the feature to log. Passed to Experiment.log_feature as name.

kwargsdict

Additional keyword arguments to be passed to Experiment.log_feature.

Returns:
rubicon.client.Feature

The logged feature.

rubicon_ml.workflow.prefect.log_metric_task(experiment, metric_name, metric_value, **kwargs)

Log a metric within a prefect flow.

This prefect task can be used within a flow to log a metric to an existing experiment.

Parameters:
experimentrubicon.client.Experiment

The experiment to log a new metric to.

metric_namestr

The name of the metric to log. Passed to Experiment.log_metric as name.

metric_valuestr

The value of the metric to log. Passed to Experiment.log_metric as value.

kwargsdict

Additional keyword arguments to be passed to Experiment.log_metric.

Returns:
rubicon.client.Metric

The logged metric.

rubicon_ml.workflow.prefect.log_parameter_task(experiment, parameter_name, parameter_value, **kwargs)

Log a parameter within a prefect flow.

This prefect task can be used within a flow to log a parameter to an existing experiment.

Parameters:
experimentrubicon.client.Experiment

The experiment to log a new parameter to.

parameter_namestr

The name of the parameter to log. Passed to Experiment.log_parameter as name.

parameter_valuestr

The value of the parameter to log. Passed to Experiment.log_parameter as value.

kwargsdict

Additional keyword arguments to be passed to Experiment.log_parameter.

Returns:
rubicon.client.Parameter

The logged parameter.