API Reference¶
Rubicon¶
- class rubicon_ml.Rubicon(persistence: str | None = 'filesystem', root_dir: str | None = None, auto_git_enabled: bool = False, composite_config: List[Dict[str, Any]] | None = None, **storage_options)¶
The rubicon client’s entry point.
Creates a Config and injects it into the client level objects at run-time.
- Parameters:
- persistencestr, optional
The persistence type. Can be one of [“filesystem”, “memory”]. Defaults to “filesystem”.
- root_dirstr, optional
Absolute or relative filepath. Use absolute path for best performance. Defaults to the local filesystem. Prefix with s3:// to use s3 instead.
- auto_git_enabledbool, optional
True to use the git command to automatically log relevant repository information to projects and experiments logged with this client instance, False otherwise. Defaults to False.
- storage_optionsdict, optional
Additional keyword arguments specific to the protocol being chosen. They are passed directly to the underlying filesystem class.
- property config¶
Returns a single config.
Exists to promote backwards compatibility.
- Returns:
- Config
A single Config
- create_project(name: str, description: str | None = None, github_url: str | None = None, training_metadata: List[Tuple] | Tuple | None = None) Project ¶
Create a project.
- Parameters:
- namestr
The project’s name.
- descriptionstr, optional
The project’s description.
- github_urlstr, optional
The URL of the GitHub repository associated with this project. If omitted and automatic git logging is enabled, it will be retrieved via git remote.
- training_metadatatuple or list of tuples, optional
Metadata associated with the training dataset(s) used across each experiment in this project.
- Returns:
- rubicon.client.Project
The created project.
- get_or_create_project(name: str, **kwargs) Project ¶
Get or create a project.
- Parameters:
- namestr
The project’s name.
- kwargsdict
Additional keyword arguments to be passed to Rubicon.create_project.
- Returns:
- rubicon.client.Project
The corresponding project.
- get_project(name: str | None = None, id: str | None = None) Project ¶
Get a project.
- Parameters:
- namestr, optional
The name of the project to get.
- idstr, optional
The id of the project to get.
- Returns:
- rubicon.client.Project
The project with name name or id id.
- get_project_as_dask_df(name, group_by=None)¶
DEPRECATED: Available for backwards compatibility.
- get_project_as_df(name, df_type='pandas', group_by=None)¶
Get a dask or pandas dataframe representation of a project.
- Parameters:
- namestr
The name of the project to get.
- df_typestr, optional
The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.
- group_bystr or None, optional
How to group the project’s experiments in the returned DataFrame(s). Valid options include [“commit_hash”].
- Returns:
- pandas.DataFrame or list of pandas.DataFrame or dask.DataFrame or list of dask.DataFrame
If group_by is None, a dask or pandas dataframe holding the project’s data. Otherwise a list of dask or pandas dataframes holding the project’s data grouped by group_by.
- is_auto_git_enabled() bool ¶
Check if git is enabled for any of the configs.
- projects()¶
Get a list of available projects.
- Returns:
- list of rubicon.client.Project
The list of available projects.
- sync(project_name: str, s3_root_dir: str, aws_profile: str | None = None, aws_shared_credentials_file: str | None = None)¶
Sync a local project to S3.
- Parameters:
- project_namestr
The name of the project to sync.
- s3_root_dirstr
The S3 path where the project’s data will be synced.
- aws_profilestr
Specifies the name of the AWS CLI profile with the credentials and options to use. Defaults to None, using the AWS default name ‘default’.
- aws_shared_credentials_filestr
Specifies the location of the file that the AWS CLI uses to store access keys. Defaults to None, using the AWS default path ‘~/.aws/credentials’.
Notes
Use sync to backup your local project data to S3 as an alternative to direct S3 logging. Leverages the AWS CLI’s aws s3 sync. Ensure that any credentials are set and that any proxies are enabled.
Project¶
- class rubicon_ml.Project(domain: ProjectDomain, config: Config | List[Config] | None = None)¶
A client project.
A project is a collection of experiments, dataframes, and artifacts identified by a unique name.
- Parameters:
- domainrubicon.domain.Project
The project domain model.
- configrubicon.client.Config
The config, which specifies the underlying repository.
- archive(experiments: List[Experiment] | None = None, remote_rubicon: Rubicon | None = None)¶
Archive the experiments logged to this project.
- Parameters:
- experimentslist of Experiments, optional
The rubicon.client.Experiment objects to archive. If None all logged experiments are archived.
- remote_rubiconrubicon_ml.Rubicon object, optional
The remote Rubicon object with the repository to archive to
- Returns:
- filepath of newly created archive
- artifact(name: str | None = None, id: str | None = None) Artifact ¶
Get an artifact logged to this project by id or name.
- Parameters:
- idstr
The id of the artifact to get.
- namestr
The name of the artifact to get.
- Returns:
- rubicon.client.Artifact
The artifact logged to this project with id id or name ‘name’.
- artifacts(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') List[Artifact] ¶
Get the artifacts logged to this client object.
- Parameters:
- namestr, optional
The name value to filter results on.
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- Returns:
- list of rubicon.client.Artifact
The artifacts previously logged to this client object.
- property created_at¶
Get the time the project was created.
- dataframe(name: str | None = None, id: str | None = None) Dataframe ¶
Get the dataframe logged to this client object.
- Parameters:
- idstr
The id of the dataframe to get.
- namestr
The name of the dataframe to get.
- Returns
- ——-
- rubicon.client.Dataframe
The dataframe logged to this project with id id or name ‘name’.
- dataframes(tags: List[str] | None = None, qtype: str = 'or', recursive: bool = False, name: str | None = None) List[Dataframe] ¶
Get the dataframes logged to this project.
- Parameters:
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- recursivebool, optional
If true, get the dataframes logged to this project’s experiments as well. Defaults to false.
- namestr
The name value to filter results on.
- Returns:
- list of rubicon.client.Dataframe
The dataframes previously logged to this client object.
- delete_artifacts(ids: List[str])¶
Delete the artifacts logged to with client object with ids ids.
- Parameters:
- idslist of str
The ids of the artifacts to delete.
- delete_dataframes(ids: List[str])¶
Delete the dataframes with ids ids logged to this client object.
- Parameters:
- idslist of str
The ids of the dataframes to delete.
- property description¶
Get the project’s description.
- experiment(id: str | None = None, name: str | None = None) Experiment ¶
Get an experiment logged to this project by id or name.
- Parameters:
- idstr
The id of the experiment to get.
- namestr
The name of the experiment to get.
- Returns:
- rubicon.client.Experiment
The experiment logged to this project with id id or name ‘name’.
- experiments(tags: List[str] | None = None, qtype: str = 'or', name: str | None = None) List[Experiment] ¶
Get the experiments logged to this project.
- Parameters:
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- name:
The name of the experiment(s) to filter results on.
- Returns:
- list of rubicon.client.Experiment
The experiments previously logged to this project.
- experiments_from_archive(remote_rubicon, latest_only: bool | None = False)¶
Retrieve archived experiments into this project’s experiments folder.
- Parameters:
- remote_rubiconrubicon_ml.Rubicon object
The remote Rubicon object with the repository containing archived experiments to read in
- latest_onlybool, optional
Indicates whether or not experiments should only be read from the latest archive
- property github_url¶
Get the project’s GitHub repository URL.
- property id¶
Get the project’s id.
- is_auto_git_enabled() bool ¶
Is git enabled for any of the configs.
- log_artifact(data_bytes: bytes | None = None, data_directory: str | None = None, data_file: TextIO | None = None, data_object: Any | None = None, data_path: str | None = None, name: str | None = None, description: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Artifact ¶
Log an artifact to this client object.
- Parameters:
- data_bytesbytes, optional
The raw bytes to log as an artifact.
- data_directorystr, optional
The path to a directory to zip and log as an artifact.
- data_fileTextIOWrapper, optional
The open file to log as an artifact.
- data_objectpython object, optional
The python object to log as an artifact.
- data_pathstr, optional
The absolute or relative local path or S3 path to the data to log as an artifact. S3 paths must be prepended with ‘s3://’.
- namestr, optional
The name of the artifact file. Required if data_path is not provided.
- descriptionstr, optional
A description of the artifact. Use to provide additional context.
- tagslist of str, optional
Values to tag the experiment with. Use tags to organize and filter your artifacts.
- commentslist of str, optional
Values to comment the experiment with. Use comments to organize and filter your artifacts.
- Returns:
- rubicon.client.Artifact
The new artifact.
Notes
Only one of data_bytes, data_file, data_object, and data_path should be provided. If more than one is given, the order of precedence is data_bytes, data_object, data_file, data_path.
Examples
>>> # Log with bytes >>> experiment.log_artifact( ... data_bytes=b'hello rubicon!', ... name="bytes_artifact", ... description="log artifact from bytes", ... )
>>> # Log zipped directory >>> experiment.log_artifact( ... data_directory="./path/to/directory/", ... name="directory.zip", ... description="log artifact from zipped directory", ... )
>>> # Log with file >>> with open('./path/to/artifact.txt', 'rb') as file: >>> project.log_artifact( ... data_file=file, ... name="file_artifact", ... description="log artifact from file", ... )
>>> # Log with file path >>> experiment.log_artifact( ... data_path="./path/to/artifact.pkl", ... description="log artifact from file path", ... )
- log_conda_environment(artifact_name: str | None = None) Artifact ¶
Log the conda environment as an artifact to this client object. Useful for recreating your exact environment at a later date.
- Parameters:
- artifact_namestr, optional
The name of the artifact (the exported conda environment).
- Returns:
- rubicon.client.Artifact
The new artifact.
Notes
Relies on running with an active conda environment.
- log_dataframe(df: pd.DataFrame | 'dd.DataFrame' | 'pl.DataFrame', description: str | None = None, name: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Dataframe ¶
Log a dataframe to this client object.
- Parameters:
- dfpandas.DataFrame, dask.dataframe.DataFrame, or polars DataFrame
The dataframe to log.
- descriptionstr, optional
The dataframe’s description. Use to provide additional context.
- tagslist of str
The values to tag the dataframe with.
- comments: list of str
The values to comment the dataframe with.
- Returns:
- rubicon.client.Dataframe
The new dataframe.
- log_experiment(name: str | None = None, description: str | None = None, model_name: str | None = None, branch_name: str | None = None, commit_hash: str | None = None, training_metadata: Tuple | List[Tuple] | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Experiment ¶
Log a new experiment to this project.
- Parameters:
- namestr
The experiment’s name.
- descriptionstr, optional
The experiment’s description. Use to provide additional context.
- model_namestr, optional
The experiment’s model name. For example, this could be the name of the registered model in Model One.
- branch_namestr, optional
The name of the active branch of the git repo this experiment is logged from. If omitted and automatic git logging is enabled, it will be retrieved via git rev-parse.
- commit_hashstr, optional
The hash of the last commit to the active branch of the git repo this experiment is logged from. If omitted and automatic git logging is enabled, it will be retrieved via git rev-parse.
- training_metadatatuple or list of tuples, optional
Metadata associated with the experiment’s training dataset(s).
- tagslist of str, optional
Values to tag the experiment with. Use tags to organize and filter your experiments. For example, tags could be used to differentiate between the type of model or classifier used during the experiment (i.e. linear regression or random forest).
- commentslist of str, optional
Values to comment the experiment with.
- Returns:
- rubicon.client.Experiment
The created experiment.
- log_h2o_model(h2o_model, artifact_name: str | None = None, export_cross_validation_predictions: bool = False, use_mojo: bool = False, **log_artifact_kwargs) Artifact ¶
Log an h2o model as an artifact using h2o.save_model.
- Parameters:
- h2o_modelh2o.model.ModelBase
The h2o model to log as an artifact.
- artifact_namestr, optional (default None)
The name of the artifact. Defaults to None, using h2o_model’s class name.
- export_cross_validation_predictions: bool, optional (default False)
Passed directly to h2o.save_model.
- use_mojo: bool, optional (default False)
Whether to log the model in MOJO format. If False, the model will be logged in binary format.
- log_artifact_kwargsdict
Additional kwargs to be passed directly to self.log_artifact.
- log_json(json_object: Dict[str, Any], name: str | None = None, description: str | None = None, tags: List[str] | None = None) Artifact ¶
Log a python dictionary to a JSON file.
- Parameters:
- json_objectDict[str, Any]
A python dictionary capable of being converted to JSON.
- nameOptional[str], optional
A name for this JSON file, by default None
- descriptionOptional[str], optional
A description for this file, by default None
- tagsOptional[List[str]], optional
Any Rubicon tags, by default None
- Returns:
- Artifact
The new artifact.
- log_pip_requirements(artifact_name: str | None = None) Artifact ¶
Log the pip requirements as an artifact to this client object. Useful for recreating your exact environment at a later date.
- Parameters:
- artifact_namestr, optional
The name of the artifact (the exported pip environment).
- Returns:
- rubicon.client.Artifact
The new artifact.
- log_with_schema(obj: Any, experiment: Experiment | None = None, experiment_kwargs: Dict[str, Any] | None = None) Any ¶
Log an experiment leveraging
self.schema_
.
- log_xgboost_model(xgboost_model: xgb.Booster, artifact_name: str | None = None, **log_artifact_kwargs: Any) Artifact ¶
Log an XGBoost model as a JSON file to this client object.
Please note that we do not currently support logging directly from the SKLearn interface.
- Parameters:
- xgboost_model: Booster
An xgboost model object in the Booster format
- artifact_namestr, optional
The name of the artifact (the exported XGBoost model).
- log_artifact_kwargsAny
Additional kwargs to be passed directly to self.log_artifact.
- Returns:
- rubicon.client.Artifact
The new artifact.
- property name¶
Get the project’s name.
- property repositories: List[BaseRepository] | None¶
Get all repositories.
- property repository: BaseRepository | None¶
Get the repository.
- set_schema(schema: Dict[str, Any]) None ¶
Set the schema for this client object.
- to_dask_df(group_by: str | None = None)¶
DEPRECATED: Available for backwards compatibility.
- to_df(df_type: str = 'pandas', group_by: str | None = None) pd.DataFrame | Dict[str, pd.DataFrame] | dd.DataFrame | Dict[str, dd.DataFrame] ¶
Loads the project’s data into dask or pandas dataframe(s) sorted by created_at. This includes the experiment details along with parameters and metrics.
- Parameters:
- df_typestr, optional
The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.
- group_bystr or None, optional
How to group the project’s experiments in the returned dataframe(s). Valid options include [“commit_hash”].
- Returns:
- pandas.DataFrame or dict of pandas.DataFrame or dask.DataFrame or dict of dask.DataFrame
If group_by is None, a dask or pandas dataframe holding the project’s data. Otherwise a dict of dask or pandas dataframes holding the project’s data grouped by group_by.
- property training_metadata¶
Get the project’s training metadata.
Experiment¶
- class rubicon_ml.Experiment(domain: ExperimentDomain, parent: Project)¶
A client experiment.
An experiment represents a model run and is identified by its ‘created_at’ time. It can have metrics, parameters, features, dataframes, and artifacts logged to it.
An experiment is logged to a project.
- Parameters:
- domainrubicon.domain.Experiment
The experiment domain model.
- parentrubicon.client.Project
The project that the experiment is logged to.
- add_child_experiment(experiment: Experiment)¶
Add tags to denote experiment as a descendent of this experiment.
- Parameters:
- experimentrubicon_ml.client.Experiment
The experiment to mark as a descendent of this experiment.
- Raises:
- RubiconException
If experiment and this experiment are not logged to the same project.
- add_comments(comments: List[str])¶
Add comments to this client object.
- Parameters:
- commentslist of str
The comment values to add.
- add_tags(tags: List[str])¶
Add tags to this client object.
- Parameters:
- tagslist of str
The tag values to add.
- artifact(name: str | None = None, id: str | None = None) Artifact ¶
Get an artifact logged to this project by id or name.
- Parameters:
- idstr
The id of the artifact to get.
- namestr
The name of the artifact to get.
- Returns:
- rubicon.client.Artifact
The artifact logged to this project with id id or name ‘name’.
- artifacts(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') List[Artifact] ¶
Get the artifacts logged to this client object.
- Parameters:
- namestr, optional
The name value to filter results on.
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- Returns:
- list of rubicon.client.Artifact
The artifacts previously logged to this client object.
- property branch_name¶
Get the experiment’s branch name.
- property comments: List[str]¶
Get this client object’s comments.
- property commit_hash¶
Get the experiment’s commit hash.
- property created_at¶
Get the time the experiment was created.
- dataframe(name: str | None = None, id: str | None = None) Dataframe ¶
Get the dataframe logged to this client object.
- Parameters:
- idstr
The id of the dataframe to get.
- namestr
The name of the dataframe to get.
- Returns
- ——-
- rubicon.client.Dataframe
The dataframe logged to this project with id id or name ‘name’.
- dataframes(name: str | None = None, tags: List[str] | None = None, qtype: str = 'or') List[Dataframe] ¶
Get the dataframes logged to this client object.
- Parameters:
- namestr, optional
The name value to filter results on.
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- Returns:
- list of rubicon.client.Dataframe
The dataframes previously logged to this client object.
- delete_artifacts(ids: List[str])¶
Delete the artifacts logged to with client object with ids ids.
- Parameters:
- idslist of str
The ids of the artifacts to delete.
- delete_dataframes(ids: List[str])¶
Delete the dataframes with ids ids logged to this client object.
- Parameters:
- idslist of str
The ids of the dataframes to delete.
- property description¶
Get the experiment’s description.
- feature(name=None, id=None)¶
Get a feature.
- Parameters:
- namestr, optional
The name of the feature to get.
- idstr, optional
The id of the feature to get.
- Returns:
- rubicon.client.Feature
The feature with name name or id id.
- features(name=None, tags=[], qtype='or')¶
Get the features logged to this experiment.
- Parameters:
- namestr, optional
The name value to filter results on.
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- Returns:
- list of rubicon.client.Feature
The features previously logged to this experiment.
- get_child_experiments() List[Experiment] ¶
Get the experiments that are tagged as children of this experiment.
- Returns:
- list of rubicon_ml.client.Experiment
The experiments that are tagged as children of this experiment.
- get_parent_experiments() List[Experiment] ¶
Get the experiments that are tagged as parents of this experiment.
- Returns:
- list of rubicon_ml.client.Experiment
The experiments that are tagged as parents of this experiment.
- property id¶
Get the experiment’s id.
- is_auto_git_enabled() bool ¶
Is git enabled for any of the configs.
- log_artifact(data_bytes: bytes | None = None, data_directory: str | None = None, data_file: TextIO | None = None, data_object: Any | None = None, data_path: str | None = None, name: str | None = None, description: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Artifact ¶
Log an artifact to this client object.
- Parameters:
- data_bytesbytes, optional
The raw bytes to log as an artifact.
- data_directorystr, optional
The path to a directory to zip and log as an artifact.
- data_fileTextIOWrapper, optional
The open file to log as an artifact.
- data_objectpython object, optional
The python object to log as an artifact.
- data_pathstr, optional
The absolute or relative local path or S3 path to the data to log as an artifact. S3 paths must be prepended with ‘s3://’.
- namestr, optional
The name of the artifact file. Required if data_path is not provided.
- descriptionstr, optional
A description of the artifact. Use to provide additional context.
- tagslist of str, optional
Values to tag the experiment with. Use tags to organize and filter your artifacts.
- commentslist of str, optional
Values to comment the experiment with. Use comments to organize and filter your artifacts.
- Returns:
- rubicon.client.Artifact
The new artifact.
Notes
Only one of data_bytes, data_file, data_object, and data_path should be provided. If more than one is given, the order of precedence is data_bytes, data_object, data_file, data_path.
Examples
>>> # Log with bytes >>> experiment.log_artifact( ... data_bytes=b'hello rubicon!', ... name="bytes_artifact", ... description="log artifact from bytes", ... )
>>> # Log zipped directory >>> experiment.log_artifact( ... data_directory="./path/to/directory/", ... name="directory.zip", ... description="log artifact from zipped directory", ... )
>>> # Log with file >>> with open('./path/to/artifact.txt', 'rb') as file: >>> project.log_artifact( ... data_file=file, ... name="file_artifact", ... description="log artifact from file", ... )
>>> # Log with file path >>> experiment.log_artifact( ... data_path="./path/to/artifact.pkl", ... description="log artifact from file path", ... )
- log_conda_environment(artifact_name: str | None = None) Artifact ¶
Log the conda environment as an artifact to this client object. Useful for recreating your exact environment at a later date.
- Parameters:
- artifact_namestr, optional
The name of the artifact (the exported conda environment).
- Returns:
- rubicon.client.Artifact
The new artifact.
Notes
Relies on running with an active conda environment.
- log_dataframe(df: pd.DataFrame | 'dd.DataFrame' | 'pl.DataFrame', description: str | None = None, name: str | None = None, tags: List[str] | None = None, comments: List[str] | None = None) Dataframe ¶
Log a dataframe to this client object.
- Parameters:
- dfpandas.DataFrame, dask.dataframe.DataFrame, or polars DataFrame
The dataframe to log.
- descriptionstr, optional
The dataframe’s description. Use to provide additional context.
- tagslist of str
The values to tag the dataframe with.
- comments: list of str
The values to comment the dataframe with.
- Returns:
- rubicon.client.Dataframe
The new dataframe.
- log_feature(name: str, description: str = None, importance: float = None, tags: list[str] = [], comments: list[str] = []) Feature ¶
Create a feature under the experiment.
- Parameters:
- namestr
The features’s name.
- descriptionstr
The feature’s description. Use to provide additional context.
- importancefloat
The feature’s importance.
- tagslist of str, optional
Values to tag the experiment with. Use tags to organize and filter your features.
- commentslist of str, optional
Values to comment the experiment with. Use comments to organize and filter your features.
- Returns:
- rubicon.client.Feature
The created feature.
- log_h2o_model(h2o_model, artifact_name: str | None = None, export_cross_validation_predictions: bool = False, use_mojo: bool = False, **log_artifact_kwargs) Artifact ¶
Log an h2o model as an artifact using h2o.save_model.
- Parameters:
- h2o_modelh2o.model.ModelBase
The h2o model to log as an artifact.
- artifact_namestr, optional (default None)
The name of the artifact. Defaults to None, using h2o_model’s class name.
- export_cross_validation_predictions: bool, optional (default False)
Passed directly to h2o.save_model.
- use_mojo: bool, optional (default False)
Whether to log the model in MOJO format. If False, the model will be logged in binary format.
- log_artifact_kwargsdict
Additional kwargs to be passed directly to self.log_artifact.
- log_json(json_object: Dict[str, Any], name: str | None = None, description: str | None = None, tags: List[str] | None = None) Artifact ¶
Log a python dictionary to a JSON file.
- Parameters:
- json_objectDict[str, Any]
A python dictionary capable of being converted to JSON.
- nameOptional[str], optional
A name for this JSON file, by default None
- descriptionOptional[str], optional
A description for this file, by default None
- tagsOptional[List[str]], optional
Any Rubicon tags, by default None
- Returns:
- Artifact
The new artifact.
- log_metric(name: str, value: float, directionality: str = 'score', description: str = None, tags: list[str] = [], comments: list[str] = []) Metric ¶
Create a metric under the experiment.
- Parameters:
- namestr
The metric’s name.
- valuefloat
The metric’s value.
- directionalitystr, optional
The metric’s directionality. Must be one of [“score”, “loss”], where “score” represents a metric to maximize, while “loss” represents a metric to minimize. Defaults to “score”.
- descriptionstr, optional
The metric’s description. Use to provide additional context.
- tagslist of str, optional
Values to tag the experiment with. Use tags to organize and filter your metrics.
- commentslist of str, optional
Values to comment the experiment with. Use comments to organize and filter your metrics.
- Returns:
- rubicon.client.Metric
The created metric.
- log_parameter(name: str, value: object = None, description: str = None, tags: list[str] = [], comments: list[str] = []) Parameter ¶
Create a parameter under the experiment.
- Parameters:
- namestr
The parameter’s name.
- valueobject, optional
The parameter’s value. Can be an object of any JSON serializable (via rubicon.utils.DomainJSONEncoder) type.
- descriptionstr, optional
The parameter’s description. Use to provide additional context.
- tagslist of str, optional
Values to tag the parameter with. Use tags to organize and filter your parameters.
- commentslist of str, optional
Values to comment the experiment with. Use comments to organize and filter your features.
- Returns:
- rubicon.client.Parameter
The created parameter.
- log_pip_requirements(artifact_name: str | None = None) Artifact ¶
Log the pip requirements as an artifact to this client object. Useful for recreating your exact environment at a later date.
- Parameters:
- artifact_namestr, optional
The name of the artifact (the exported pip environment).
- Returns:
- rubicon.client.Artifact
The new artifact.
- log_xgboost_model(xgboost_model: xgb.Booster, artifact_name: str | None = None, **log_artifact_kwargs: Any) Artifact ¶
Log an XGBoost model as a JSON file to this client object.
Please note that we do not currently support logging directly from the SKLearn interface.
- Parameters:
- xgboost_model: Booster
An xgboost model object in the Booster format
- artifact_namestr, optional
The name of the artifact (the exported XGBoost model).
- log_artifact_kwargsAny
Additional kwargs to be passed directly to self.log_artifact.
- Returns:
- rubicon.client.Artifact
The new artifact.
- metric(name=None, id=None)¶
Get a metric.
- Parameters:
- namestr, optional
The name of the metric to get.
- idstr, optional
The id of the metric to get.
- Returns:
- rubicon.client.Metric
The metric with name name or id id.
- metrics(name=None, tags=[], qtype='or')¶
Get the metrics logged to this experiment.
- Parameters:
- namestr, optional
The name value to filter results on.
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- Returns:
- list of rubicon.client.Metric
The metrics previously logged to this experiment.
- property model_name¶
Get the experiment’s model name.
- property name¶
Get the experiment’s name.
- parameter(name=None, id=None)¶
Get a parameter.
- Parameters:
- namestr, optional
The name of the parameter to get.
- idstr, optional
The id of the parameter to get.
- Returns:
- rubicon.client.Parameter
The parameter with name name or id id.
- parameters(name=None, tags=[], qtype='or')¶
Get the parameters logged to this experiment.
- Parameters:
- namestr, optional
The name value to filter results on.
- tagslist of str, optional
The tag values to filter results on.
- qtypestr, optional
The query type to filter results on. Can be ‘or’ or ‘and’. Defaults to ‘or’.
- Returns:
- list of rubicon.client.Parameter
The parameters previously logged to this experiment.
- property project¶
Get the project client object that this experiment belongs to.
- remove_comments(comments: List[str])¶
Remove comments from this client object.
- Parameters:
- commentslist of str
The comment values to remove.
- remove_tags(tags: List[str])¶
Remove tags from this client object.
- Parameters:
- tagslist of str
The tag values to remove.
- property repositories: List[BaseRepository] | None¶
Get all repositories.
- property repository: BaseRepository | None¶
Get the repository.
- property tags: TagContainer¶
Get this client object’s tags.
- property training_metadata¶
Get the project’s training metadata.
Parameter¶
- class rubicon_ml.Parameter(domain: ParameterDomain, parent: Experiment)¶
A client parameter.
A parameter is an input to an experiment (model run) that depends on the type of model being used. It affects the model’s predictions.
For example, if you were using a random forest classifier, ‘n_estimators’ (the number of trees in the forest) could be a parameter.
A parameter is logged to an experiment.
- Parameters:
- domainrubicon.domain.Parameter
The parameter domain model.
- parentrubicon.client.Experiment
The experiment that the parameter is logged to.
- add_comments(comments: List[str])¶
Add comments to this client object.
- Parameters:
- commentslist of str
The comment values to add.
- add_tags(tags: List[str])¶
Add tags to this client object.
- Parameters:
- tagslist of str
The tag values to add.
- property comments: List[str]¶
Get this client object’s comments.
- property created_at: datetime¶
Get the time the parameter was created.
- property description: str | None¶
Get the parameter’s description.
- property id: str¶
Get the parameter’s id.
- is_auto_git_enabled() bool ¶
Is git enabled for any of the configs.
- property name: str | None¶
Get the parameter’s name.
- property parent: Experiment¶
Get the parameter’s parent client object.
- remove_comments(comments: List[str])¶
Remove comments from this client object.
- Parameters:
- commentslist of str
The comment values to remove.
- remove_tags(tags: List[str])¶
Remove tags from this client object.
- Parameters:
- tagslist of str
The tag values to remove.
- property repositories: List[BaseRepository] | None¶
Get all repositories.
- property repository: BaseRepository | None¶
Get the repository.
- property tags: TagContainer¶
Get this client object’s tags.
- property value: object | float | None¶
Get the parameter’s value.
Feature¶
- class rubicon_ml.Feature(domain: FeatureDomain, parent: Experiment)¶
A client feature.
A feature is an input to an experiment (model run) that’s an independent, measurable property of a phenomenon being observed. It affects the model’s predictions.
For example, consider a model that predicts how likely a customer is to pay back a loan. Possible features could be ‘year’, ‘credit score’, etc.
A feature is logged to an experiment.
- Parameters:
- domainrubicon.domain.Feature
The feature domain model.
- configrubicon.client.Config
The config, which specifies the underlying repository.
- parentrubicon.client.Experiment
The experiment that the feature is logged to.
- add_comments(comments: List[str])¶
Add comments to this client object.
- Parameters:
- commentslist of str
The comment values to add.
- add_tags(tags: List[str])¶
Add tags to this client object.
- Parameters:
- tagslist of str
The tag values to add.
- property comments: List[str]¶
Get this client object’s comments.
- property created_at: datetime¶
Get the feature’s created_at.
- property description: str | None¶
Get the feature’s description.
- property id: str¶
Get the feature’s id.
- property importance¶
Get the feature’s importance.
- is_auto_git_enabled() bool ¶
Is git enabled for any of the configs.
- property name: str | None¶
Get the feature’s name.
- property parent: Experiment¶
Get the feature’s parent client object.
- remove_comments(comments: List[str])¶
Remove comments from this client object.
- Parameters:
- commentslist of str
The comment values to remove.
- remove_tags(tags: List[str])¶
Remove tags from this client object.
- Parameters:
- tagslist of str
The tag values to remove.
- property repositories: List[BaseRepository] | None¶
Get all repositories.
- property repository: BaseRepository | None¶
Get the repository.
- property tags: TagContainer¶
Get this client object’s tags.
Metric¶
- class rubicon_ml.Metric(domain: MetricDomain, parent: Experiment)¶
A client metric.
A metric is a single-value output of an experiment that helps evaluate the quality of the model’s predictions.
It can be either a ‘score’ (value to maximize) or a ‘loss’ (value to minimize).
A metric is logged to an experiment.
- Parameters:
- domainrubicon.domain.Metric
The metric domain model.
- parentrubicon.client.Experiment
The experiment that the metric is logged to.
- add_comments(comments: List[str])¶
Add comments to this client object.
- Parameters:
- commentslist of str
The comment values to add.
- add_tags(tags: List[str])¶
Add tags to this client object.
- Parameters:
- tagslist of str
The tag values to add.
- property comments: List[str]¶
Get this client object’s comments.
- property created_at: datetime¶
Get the metric’s created_at.
- property description: str | None¶
Get the metric’s description.
- property directionality: str¶
Get the metric’s directionality.
- property id: str¶
Get the metric’s id.
- is_auto_git_enabled() bool ¶
Is git enabled for any of the configs.
- property name: str | None¶
Get the metric’s name.
- property parent: Experiment¶
Get the metric’s parent client object.
- remove_comments(comments: List[str])¶
Remove comments from this client object.
- Parameters:
- commentslist of str
The comment values to remove.
- remove_tags(tags: List[str])¶
Remove tags from this client object.
- Parameters:
- tagslist of str
The tag values to remove.
- property repositories: List[BaseRepository] | None¶
Get all repositories.
- property repository: BaseRepository | None¶
Get the repository.
- property tags: TagContainer¶
Get this client object’s tags.
- property value¶
Get the metric’s value.
Dataframe¶
- class rubicon_ml.Dataframe(domain: DataframeDomain, parent: Experiment | Project)¶
A client dataframe.
A dataframe is a two-dimensional, tabular dataset with labeled axes (rows and columns) that provides value to the model developer and/or reviewer when visualized.
For example, confusion matrices, feature importance tables and marginal residuals can all be logged as a dataframe.
A dataframe is logged to a project or an experiment.
- Parameters:
- domainrubicon.domain.Dataframe
The dataframe domain model.
- parentrubicon.client.Project or rubicon.client.Experiment
The project or experiment that the artifact is logged to.
- add_comments(comments: List[str])¶
Add comments to this client object.
- Parameters:
- commentslist of str
The comment values to add.
- add_tags(tags: List[str])¶
Add tags to this client object.
- Parameters:
- tagslist of str
The tag values to add.
- property comments: List[str]¶
Get this client object’s comments.
- property created_at¶
Get the time this dataframe was created.
- property description¶
Get the dataframe’s description.
- get_data(df_type: Literal['pandas', 'dask'] = 'pandas')¶
Loads the data associated with this Dataframe into a pandas or dask dataframe.
- Parameters:
- df_typestr, optional
The type of dataframe to return. Valid options include [“dask”, “pandas”]. Defaults to “pandas”.
- property id¶
Get the dataframe’s id.
- is_auto_git_enabled() bool ¶
Is git enabled for any of the configs.
- property name¶
Get the dataframe’s name.
- property parent¶
Get the dataframe’s parent client object.
- plot(df_type: Literal['pandas', 'dask'] = 'pandas', plotting_func: Callable | None = None, **kwargs)¶
Render the dataframe using plotly.express.
- Parameters:
- df_typestr, optional
The type of dataframe. Can be either pandas or dask. Defaults to ‘pandas’.
- plotting_funcfunction, optional
The plotly.express plotting function used to visualize the dataframes. Available options can be found at https://plotly.com/python-api-reference/plotly.express.html. Defaults to plotly.express.line.
- kwargsdict, optional
Keyword arguments to be passed to plotting_func. Available options can be found in the documentation of the individual functions at the URL above.
Examples
>>> # Log a line plot >>> dataframe.plot(x='Year', y='Number of Subscriptions')
>>> # Log a bar plot >>> import plotly.express as px >>> dataframe.plot(plotting_func=px.bar, x='Year', y='Number of Subscriptions')
- remove_comments(comments: List[str])¶
Remove comments from this client object.
- Parameters:
- commentslist of str
The comment values to remove.
- remove_tags(tags: List[str])¶
Remove tags from this client object.
- Parameters:
- tagslist of str
The tag values to remove.
- property repositories: List[BaseRepository] | None¶
Get all repositories.
- property repository: BaseRepository | None¶
Get the repository.
- property tags: TagContainer¶
Get this client object’s tags.
Artifact¶
- class rubicon_ml.Artifact(domain: ArtifactDomain, parent: Project)¶
A client artifact.
An artifact is a catch-all for any other type of data that can be logged to a file.
For example, a snapshot of a trained model (.pkl) can be logged to the experiment created during its run. Or, a base model for the model in development can be logged to a project when leveraging transfer learning.
An artifact is logged to a project or an experiment.
- Parameters:
- domainrubicon.domain.Artifact
The artifact domain model.
- parentrubicon.client.Project or rubicon.client.Experiment
The project or experiment that the artifact is logged to.
- add_comments(comments: List[str])¶
Add comments to this client object.
- Parameters:
- commentslist of str
The comment values to add.
- add_tags(tags: List[str])¶
Add tags to this client object.
- Parameters:
- tagslist of str
The tag values to add.
- property comments: List[str]¶
Get this client object’s comments.
- property created_at¶
Get the time this dataframe was created.
- property data¶
Get the artifact’s raw data.
- property description: str¶
Get the artifact’s description.
- download(location: str | None = None, name: str | None = None, unzip: bool = False)¶
Download this artifact’s data.
- Parameters:
- locationstr, optional
The absolute or relative local directory or S3 bucket to download the artifact to. S3 buckets must be prepended with ‘s3://’. Defaults to the current local working directory.
- namestr, optional
The name to give the downloaded artifact file. Defaults to the artifact’s given name when logged.
- unzipbool, optional
True to unzip the artifact data. False otherwise. Defaults to False.
- get_data(deserialize: Literal['h2o', 'h2o_binary', 'h2o_mojo', 'pickle', 'xgboost'] | None = None, unpickle: bool = False)¶
Loads the data associated with this artifact and unpickles if needed.
- Parameters:
- deseralizestr, optional
Method to use to deseralize this artifact’s data. * None to disable deseralization and return the raw data. * “h2o” or “h2o_binary” to use h2o.load_model to load the data. * “h2o_mojo” to use h2o.import_mojo to load the data. * “pickle” to use pickles to load the data. * “xgboost” to use xgboost’s JSON loader to load the data as a fitted model. Defaults to None.
- unpicklebool, optional
Flag indicating whether or not to unpickle artifact data. deserialize takes precedence. Defaults to False. Deprecated: Please use deserialize=”pickle” in the future.
- property id: str¶
Get the artifact’s id.
- is_auto_git_enabled() bool ¶
Is git enabled for any of the configs.
- property name: str¶
Get the artifact’s name.
- property parent¶
Get the artifact’s parent client object.
- remove_comments(comments: List[str])¶
Remove comments from this client object.
- Parameters:
- commentslist of str
The comment values to remove.
- remove_tags(tags: List[str])¶
Remove tags from this client object.
- Parameters:
- tagslist of str
The tag values to remove.
- property repositories: List[BaseRepository] | None¶
Get all repositories.
- property repository: BaseRepository | None¶
Get the repository.
- property tags: TagContainer¶
Get this client object’s tags.
- temporary_download(unzip: bool = False)¶
Temporarily download this artifact’s data within a context manager.
- Parameters:
- unzipbool, optional
True to unzip the artifact data. False otherwise. Defaults to False.
- Yields:
- file
An open file pointer into the directory the artifact data was temporarily downloaded into. If the artifact is a single file, its name is stored in the artifact.name attribute.
exception_handling¶
- rubicon_ml.set_failure_mode(failure_mode: str, traceback_chain: bool = False, traceback_limit: int | None = None) None ¶
Set the failure mode.
- Parameters:
- failure_modestr
The name of the failure mode to set. “raise” to raise all exceptions, “log” to catch all exceptions and log them via logging.error, “warn” to catch all exceptions and re-raise them as warnings via warnings.warn. Defaults to “raise”.
- traceback_chainbool, optional
True to display each error in the traceback chain when logging or warning, False to display only the first. Defaults to False.
- traceback_limitint, optional
The depth of the traceback displayed when logging or warning. 0 to display only the error’s text, each increment shows another line of the traceback.
publish¶
rubicon_ml
leverages intake
to easily share sets of experiments.
- rubicon_ml.publish(experiments, visualization_object: ExperimentsTable | MetricCorrelationPlot | DataframePlot | MetricListsComparison | None = None, output_filepath=None, base_catalog_filepath=None)¶
Publish experiments to an intake catalog that can be read by the intake-rubicon driver.
- Parameters:
- experimentslist of rubicon_ml.client.experiment.Experiment
The experiments to publish.
- output_filepathstr, optional
The absolute or relative local filepath or S3 bucket and key to log the generated YAML file to. S3 buckets must be prepended with ‘s3://’. Defaults to None, which disables writing the generated YAML.
- base_catalog_filepathstr, optional
Similar to output_filepath except this argument is used as a base base file to update an existing intake catalog. Defaults to None, creating a new intake catalog.
- Returns:
- str
The YAML string representation of the intake catalog containing the experiments experiments.
RubiconJSON¶
- class rubicon_ml.RubiconJSON(rubicon_objects: List[Rubicon] | None = None, projects: List[Project] | None = None, experiments: List[Experiment] | None = None)¶
RubiconJSON converts top-level rubicon_ml objects, projects, and experiments into a JSON structured dictionary for JSONPath-like querying with jsonpath-ng.
- Parameters:
- rubicon_objectsrubicon.client.Rubicon or list of type rubicon.client.Rubicon
Top-level rubicon-ml objects to convert to JSON for querying.
- projectsrubicon.client.Project or list of type rubicon.client.Project
rubicon-ml projects to convert to JSON for querying.
- experimentsrubicon.client.Experiment or list of type rubicon.client.Experiment
rubicon-ml experiments to convert to JSON for querying.
- search(query: str)¶
Query the JSON generated from the RubiconJSON instantiation in a JSONPath-like manner. Can return queries as rubicon_ml.client objects by specifying return_type parameter. Will return as JSON structured dict by default.
- Parameters:
- query: JSONPath-like query
schema¶
Methods and a mixin to enable schema logging.
The functions available in the schema
submodule are applied to
rubicon_ml.Project
via the SchemaMixin
class. They can be
called directly as a method of an existing project.
- class rubicon_ml.schema.logger.SchemaMixin¶
Adds schema logging support to a client object.
- log_with_schema(obj: Any, experiment: Experiment | None = None, experiment_kwargs: Dict[str, Any] | None = None) Any ¶
Log an experiment leveraging
self.schema_
.
- set_schema(schema: Dict[str, Any]) None ¶
Set the schema for this client object.
Mehtods for interacting with the existing rubicon-ml schema
.
- rubicon_ml.schema.registry.available_schema() List[str] ¶
Get the names of all available schema.
- rubicon_ml.schema.registry.get_schema(name: str) Any ¶
Get the schema with name
name
.
- rubicon_ml.schema.registry.get_schema_name(obj: Any) str ¶
Get the name of the schema that represents object
obj
.
- rubicon_ml.schema.registry.register_schema(name: str, schema: dict)¶
Add a schema to the schema registry.
sklearn¶
rubicon_ml
offers direct integration with Scikit-learn via our
own pipeline object.
- class rubicon_ml.sklearn.RubiconPipeline(project, steps, user_defined_loggers={}, experiment_kwargs={'name': 'RubiconPipeline experiment'}, memory=None, verbose=False, ignore_warnings=False)¶
An extension of sklearn.pipeline.Pipeline that automatically creates a Rubicon experiment under the provided project and logs the pipeline’s parameters and metrics to it.
A single pipeline run will result in a single experiment logged with its corresponding parameters and metrics pulled from the pipeline’s estimators.
- Parameters:
- projectrubicon_ml.client.Project
The rubicon project to log to.
- stepslist
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
- user_defined_loggersdict, optional
A dict mapping the estimator name to a corresponding user defined logger. See the example below for more details.
- experiment_kwargsdict, optional
Additional keyword arguments to be passed to project.log_experiment().
- memorystr or object with the joblib.Memory interface, default=None
Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute
named_steps
orsteps
to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. (docstring source: Scikit-Learn)- verbosebool, default=False
If True, the time elapsed while fitting each step will be printed as it is completed. (docstring source: Scikit-Learn)
- ignore_warningsbool, default=False
If True, ignores warnings thrown by pipeline.
Examples
>>> pipeline = RubiconPipeline( ... project, ... [ ... ("vect", CountVectorizer()), ... ("tfidf", TfidfTransformer()), ... ("clf", SGDClassifier()), ... ], ... user_defined_loggers = { ... "vect": FilterEstimatorLogger( ... select=["input", "decode_error", "max_df"], ... ), ... "tfidf": FilterEstimatorLogger(ignore_all=True), ... "clf": FilterEstimatorLogger( ... ignore=["alpha", "penalty"], ... ), ... } ... )
- fit(X, y=None, tags=None, log_fit_params=True, experiment=None, **fit_params)¶
Fit the model and automatically log the fit_params to rubicon-ml. Optionally, pass tags to update the experiment’s tags.
- Parameters:
- Xiterable
Training data. Must fulfill input requirements of first step of the pipeline.
- yiterable, optional
Training targets. Must fulfill label requirements for all steps of the pipeline.
- tagslist, optional
Additional tags to add to the experiment during the fit.
- log_fit_paramsbool, optional
True to log the values passed as fit_params to this pipeline’s experiment. Defaults to True.
- fit_paramsdict, optional
Additional keyword arguments to be passed to sklearn.pipeline.Pipeline.fit().
- experiment: rubicon_ml.experiment.client.Experiment, optional
The experiment to log the to. If no experiment is provided the metrics are logged to a new experiment with self.experiment_kwargs.
- Returns:
- rubicon_ml.sklearn.Pipeline
This RubiconPipeline.
- get_estimator_logger(step_name=None, estimator=None)¶
Get a logger for the estimator. By default, the logger will have the current experiment set.
- score(X, y=None, sample_weight=None, experiment=None)¶
Score with the final estimator and automatically log the results to rubicon-ml.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- yiterable, optional
Targets used for scoring. Must fulfill label requirements for all steps of the pipeline.
- sample_weightlist, optional
If not None, this argument is passed as sample_weight keyword argument to the score method of the final estimator.
- experiment: rubicon_ml.experiment.client.Experiment, optional
The experiment to log the score to. If no experiment is provided the score is logged to a new experiment with self.experiment_kwargs.
- Returns:
- float
Result of calling score on the final estimator.
- score_samples(X, experiment=None)¶
Score samples with the final estimator and automatically log the results to rubicon-ml.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- experiment: rubicon_ml.experiment.client.Experiment, optional
The experiment to log the score to. If no experiment is provided the score is logged to a new experiment with self.experiment_kwargs.
- Returns:
- ndarray of shape (n_samples,)
Result of calling score_samples on the final estimator.
- set_fit_request(*, experiment: bool | None | str = '$UNCHANGED$', log_fit_params: bool | None | str = '$UNCHANGED$', tags: bool | None | str = '$UNCHANGED$') RubiconPipeline ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- experimentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
experiment
parameter infit
.- log_fit_paramsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
log_fit_params
parameter infit
.- tagsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
tags
parameter infit
.
- Returns:
- selfobject
The updated object.
- set_score_request(*, experiment: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') RubiconPipeline ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- experimentstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
experiment
parameter inscore
.- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
- Returns:
- selfobject
The updated object.
- class rubicon_ml.sklearn.FilterEstimatorLogger(estimator=None, experiment=None, step_name=None, select=[], ignore=[], ignore_all=False)¶
The filter logger for sklearn estimators. Use this logger to either select or ignore specific parameters for logging.
- Parameters:
- estimatora sklearn estimator, optional
The estimator
- experimentrubicon.client.Experiment, optional
The experiment to log the parameters and metrics to.
- step_namestr, optional
The name of the pipeline step.
- selectlist, optional
The list of parameters on this estimator that you’d like to log. All other parameters will be ignored.
- ignorelist, optional
The list of parameters on this estimator that you’d like to ignore by not logging. The other parameters will be logged.
- ignore_allbool, optional
Ignore all parameters if true.
- rubicon_ml.sklearn.pipeline.make_pipeline(project, *steps, experiment_kwargs={'name': 'RubiconPipeline experiment'}, memory=None, verbose=False)¶
Wrapper around RubicionPipeline(). Does not require naming for estimators. Their names are set to the lowercase strings of their types.
- Parameters:
- projectrubicon_ml.client.Project
The rubicon project to log to.
- stepslist
List of estimator objects or (estimator, logger) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator. (docstring source: Scikit-Learn)
- experiment_kwargsdict, optional
Additional keyword arguments to be passed to project.log_experiment().
- memorystr or object with the joblib.Memory interface, default=None
Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute
named_steps
orsteps
to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. (docstring source: Scikit-Learn)- verbosebool, default=False
If True, the time elapsed while fitting each step will be printed as it is completed. (docstring source: Scikit-Learn)
- Returns:
- rubicon_ml.sklearn.Pipeline
A RubiconPipeline with project project and steps steps.
viz¶
rubicon_ml
offers visualization leveraging Dash and Plotly.
Each of the following classes are standalone widgets.
- class rubicon_ml.viz.DataframePlot(dataframe_name, experiments=None, plotting_func=<function line>, plotting_func_kwargs={}, x=None, y=None)¶
Plot the dataframes with name dataframe_name logged to the experiments experiments on a shared axis.
- Parameters:
- dataframe_namestr
The name of the dataframe to plot. A dataframe with name dataframe_name must be logged to each experiment in experiments.
- experimentslist of rubicon_ml.client.experiment.Experiment, optional
The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
- plotting_funcfunction, optional
The plotly.express plotting function used to visualize the dataframes. Available options can be found at https://plotly.com/python-api-reference/plotly.express.html. Defaults to plotly.express.line.
- plotting_func_kwargsdict, optional
Keyword arguments to be passed to plotting_func. Available options can be found in the documentation of the individual functions at the URL above.
- xstr, optional
The name of the column in the dataframes with name dataframe_name to plot across the x-axis.
- ystr, optional
The name of the column in the dataframes with name dataframe_name to plot across the y-axis.
- serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶
Serve the Dash app on the next available port to render the visualization.
- Parameters:
- in_backgroundbool, optional
DEPRECATED. Background processing is now handled by jupyter_mode.
- jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional
How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output
cell.
“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.
Defaults to “external”.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶
Serve the Dash app on the next available port to render the visualization.
Additionally, renders the visualization inline in the current Jupyter notebook.
- Parameters:
- i_frame_kwargs: dict, optional
DEPRECATED. Use height and width instead.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- heightint, str or None, optional
The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- widthint, str or None, optional
The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- class rubicon_ml.viz.ExperimentsTable(experiments=None, is_selectable=True, metric_names=None, metric_query_tags=None, metric_query_type=None, parameter_names=None, parameter_query_tags=None, parameter_query_type=None)¶
Visualize the experiments experiments and their metadata, metrics, and parameters in a tabular format.
- Parameters:
- experimentslist of rubicon_ml.client.experiment.Experiment, optional
The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
- is_selectablebool, optional
True to enable selection of the rows in the table, False otherwise. Defaults to True.
- metric_nameslist of str
If provided, only show the metrics with names in the given list. If metric_query_tags are also provided, this will only select metrics from the tag-filtered results.
- metric_query_tagslist of str, optional
If provided, only show the metrics with the given tags in the table.
- metric_query_type‘and’ or ‘or’, optional
When metric_query_tags are given, ‘and’ shows the metrics with all of the given tags and ‘or’ shows the metrics with any of the given tags.
- parameter_nameslist of str
If provided, only show the parameters with names in the given list. If parameter_query_tags are also provided, this will only select parameters from the tag-filtered results.
- parameter_query_tagslist of str, optional
If provided, only show the parameters with the given tags in the table.
- parameter_query_type‘and’ or ‘or’, optional
When parameter_query_tags are given, ‘and’ shows the paramters with all of the given tags and ‘or’ shows the parameters with any of the given tags.
- serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶
Serve the Dash app on the next available port to render the visualization.
- Parameters:
- in_backgroundbool, optional
DEPRECATED. Background processing is now handled by jupyter_mode.
- jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional
How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output
cell.
“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.
Defaults to “external”.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶
Serve the Dash app on the next available port to render the visualization.
Additionally, renders the visualization inline in the current Jupyter notebook.
- Parameters:
- i_frame_kwargs: dict, optional
DEPRECATED. Use height and width instead.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- heightint, str or None, optional
The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- widthint, str or None, optional
The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- class rubicon_ml.viz.MetricCorrelationPlot(experiments=None, metric_names=None, parameter_names=None, selected_metric=None)¶
Visualize the correlation between the parameters and metrics logged to the experiments experiments using a parallel coordinates plot.
More info on parallel coordinates plots can be found here: https://plotly.com/python/parallel-coordinates-plot/
- Parameters:
- experimentslist of rubicon_ml.client.experiment.Experiment, optional
The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
- metric_nameslist of str
The names of the metrics to load. Defaults to None, which loads all metrics logged to the experiments experiments.
- parameter_nameslist of str
The names of the parameters to load. Defaults to None, which loads all parameters logged to the experiments experiments.
- selected_metricstr
The name of the metric to display at launch. Defaults to None, which selects the metric loaded first.
- serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶
Serve the Dash app on the next available port to render the visualization.
- Parameters:
- in_backgroundbool, optional
DEPRECATED. Background processing is now handled by jupyter_mode.
- jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional
How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output
cell.
“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.
Defaults to “external”.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶
Serve the Dash app on the next available port to render the visualization.
Additionally, renders the visualization inline in the current Jupyter notebook.
- Parameters:
- i_frame_kwargs: dict, optional
DEPRECATED. Use height and width instead.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- heightint, str or None, optional
The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- widthint, str or None, optional
The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- class rubicon_ml.viz.MetricListsComparison(column_names=None, experiments=None, selected_metric=None)¶
Visualize lists of metrics logged to the experiments experiments as an annotated heatmap.
More info on annotated heatmaps can be found here: https://plotly.com/python/annotated-heatmap/
- Parameters:
- column_nameslist of str
Titles to use for each column in the heatmap. Defaults to None.
- experimentslist of rubicon_ml.client.experiment.Experiment, optional
The experiments to visualize. Defaults to None. Can be set as attribute after instantiation.
- selected_metricstr
The name of the metric to display at launch. Defaults to None, which selects the metric loaded first.
- serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶
Serve the Dash app on the next available port to render the visualization.
- Parameters:
- in_backgroundbool, optional
DEPRECATED. Background processing is now handled by jupyter_mode.
- jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional
How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output
cell.
“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.
Defaults to “external”.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶
Serve the Dash app on the next available port to render the visualization.
Additionally, renders the visualization inline in the current Jupyter notebook.
- Parameters:
- i_frame_kwargs: dict, optional
DEPRECATED. Use height and width instead.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- heightint, str or None, optional
The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- widthint, str or None, optional
The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
Widgets can be combined into an interactive dashboard.
- class rubicon_ml.viz.Dashboard(experiments, widgets=None, link_experiment_table=True)¶
Compose visualizations into a dashboard to view multiple widgets at once.
- Parameters:
- experimentslist of rubicon_ml.client.experiment.Experiment
The experiments to visualize.
- widgetslist of lists of superclasses of rubicon_ml.viz.base.VizBase, optional
The widgets to compose in this dashboard. The widgets should be instantiated without experiments prior to passing as an argument to Dashboard. Defaults to a stacked layout of an ExperimentsTable and a MetricCorrelationPlot.
- link_experiment_tablebool, optional
True to enable the callbacks that allow instances of ExperimentsTable to update the experiment inputs of the other widgets in this dashboard. False otherwise. Defaults to True.
- serve(in_background: bool = False, jupyter_mode: Literal['external', 'inline', 'jupyterlab', 'tab'] = 'external', dash_kwargs: Dict = {}, run_server_kwargs: Dict = {})¶
Serve the Dash app on the next available port to render the visualization.
- Parameters:
- in_backgroundbool, optional
DEPRECATED. Background processing is now handled by jupyter_mode.
- jupyter_mode“external”, “inline”, “jupyterlab”, or “tab”, optional
How to render the dashboard when running from Jupyterlab. * “external” to serve the dashboard at an external link. * “inline” to render the dashboard in the current notebook’s output
cell.
“jupyterlab” to render the dashboard in a new window within the current Jupyterlab session.
“tab” to serve the dashboard at an external link and open a new browser tab to said link.
Defaults to “external”.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- show(i_frame_kwargs: Dict = {}, dash_kwargs: Dict = {}, run_server_kwargs: Dict = {}, height: int | str | None = None, width: int | str | None = None)¶
Serve the Dash app on the next available port to render the visualization.
Additionally, renders the visualization inline in the current Jupyter notebook.
- Parameters:
- i_frame_kwargs: dict, optional
DEPRECATED. Use height and width instead.
- dash_kwargsdict, optional
Keyword arguments to be passed along to the newly instantiated Dash object. Available options can be found at https://dash.plotly.com/reference#dash.dash.
- run_server_kwargsdict, optional
Keyword arguments to be passed along to Dash.run_server. Available options can be found at https://dash.plotly.com/reference#app.run_server. Most commonly, the ‘port’ argument can be provided here to serve the app on a specific port.
- heightint, str or None, optional
The height of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
- widthint, str or None, optional
The width of the inline visualizaiton. Integers represent number of pixels, strings represent a percentage of the window and must end with ‘%’.
workflow.prefect¶
rubicon_ml
contains wrappers for the workflow management engine
Prefect. These tasks represent a Prefect-ified rubicon_ml
client.
- rubicon_ml.workflow.prefect.create_experiment_task(project, **kwargs)¶
Create an experiment within a prefect flow.
This prefect task can be used within a flow to create a new experiment under an existing project.
- Parameters:
- projectrubicon.client.Project
The project under which the experiment will be created.
- kwargsdict
Keyword arguments to be passed to Project.log_experiment.
- Returns:
- rubicon.client.Experiment
The created experiment.
- rubicon_ml.workflow.prefect.get_or_create_project_task(persistence, root_dir, project_name, auto_git_enabled=False, storage_options={}, **kwargs)¶
Get or create a project within a prefect flow.
This prefect task can be used within a flow to create a new project or get an existing one. It should be the entry point to any prefect flow that logs data to Rubicon.
- Parameters:
- persistencestr
The persistence type to be passed to the Rubicon constructor.
- root_dirstr
The root directory to be passed to the Rubicon constructor.
- project_namestr
The name of the project to get or create.
- auto_git_enabledbool, optional
True to use the git command to automatically log relevant repository information to projects and experiments logged with the client instance created in this task, False otherwise. Defaults to False.
- storage_optionsdict, optional
Additional keyword arguments specific to the protocol being chosen. They are passed directly to the underlying filesystem class.
- kwargsdict
Additional keyword arguments to be passed to Rubicon.create_project.
- Returns:
- rubicon.client.Project
The project with name project_name.
- rubicon_ml.workflow.prefect.log_artifact_task(parent, **kwargs)¶
Log an artifact within a prefect flow.
This prefect task can be used within a flow to log an artifact to an existing project or experiment.
- Parameters:
- parentrubicon.client.Project or rubicon.client.Experiment
The project or experiment to log the artifact to.
- kwargsdict
Keyword arguments to be passed to Project.log_artifact or Experiment.log_artifact.
- Returns:
- rubicon.client.Artifact
The logged artifact.
- rubicon_ml.workflow.prefect.log_dataframe_task(parent, df, **kwargs)¶
Log a dataframe within a prefect flow.
This prefect task can be used within a flow to log a dataframe to an existing project or experiment.
- Parameters:
- parentrubicon.client.Project or rubicon.client.Experiment
The project or experiment to log the dataframe to.
- dfpandas.DataFrame or dask.dataframe.DataFrame
The pandas or dask dataframe to log.
- kwargsdict
Additional keyword arguments to be passed to Project.log_dataframe or Experiment.log_dataframe.
- Returns:
- rubicon.client.Dataframe
The logged dataframe.
- rubicon_ml.workflow.prefect.log_feature_task(experiment, feature_name, **kwargs)¶
Log a feature within a prefect flow.
This prefect task can be used within a flow to log a feature to an existing experiment.
- Parameters:
- experimentrubicon.client.Experiment
The experiment to log a new feature to.
- feature_namestr
The name of the feature to log. Passed to Experiment.log_feature as name.
- kwargsdict
Additional keyword arguments to be passed to Experiment.log_feature.
- Returns:
- rubicon.client.Feature
The logged feature.
- rubicon_ml.workflow.prefect.log_metric_task(experiment, metric_name, metric_value, **kwargs)¶
Log a metric within a prefect flow.
This prefect task can be used within a flow to log a metric to an existing experiment.
- Parameters:
- experimentrubicon.client.Experiment
The experiment to log a new metric to.
- metric_namestr
The name of the metric to log. Passed to Experiment.log_metric as name.
- metric_valuestr
The value of the metric to log. Passed to Experiment.log_metric as value.
- kwargsdict
Additional keyword arguments to be passed to Experiment.log_metric.
- Returns:
- rubicon.client.Metric
The logged metric.
- rubicon_ml.workflow.prefect.log_parameter_task(experiment, parameter_name, parameter_value, **kwargs)¶
Log a parameter within a prefect flow.
This prefect task can be used within a flow to log a parameter to an existing experiment.
- Parameters:
- experimentrubicon.client.Experiment
The experiment to log a new parameter to.
- parameter_namestr
The name of the parameter to log. Passed to Experiment.log_parameter as name.
- parameter_valuestr
The value of the parameter to log. Passed to Experiment.log_parameter as value.
- kwargsdict
Additional keyword arguments to be passed to Experiment.log_parameter.
- Returns:
- rubicon.client.Parameter
The logged parameter.