Available Backend Repositories¶
rubicon-ml uses pluggable storage backends powered by fsspec. The backend is
selected via the persistence and root_dir keyword arguments when
instantiating the Rubicon client. All backends expose the
same logging API, so switching between them requires only changing how you
create the client — the rest of your code stays the same.
Backend |
|
|
Install |
|---|---|---|---|
Local filesystem |
|
|
(included) |
Amazon S3 |
|
|
|
In-memory |
|
(optional) |
(included) |
Weights & Biases |
|
(unused) |
|
Local Filesystem¶
The local filesystem backend persists rubicon-ml data to a directory on the machine where your code is running. It is the default backend and requires no extra dependencies.
from rubicon_ml import Rubicon
rubicon = Rubicon(persistence="filesystem", root_dir="/path/to/rubicon-root")
root_dir must be an absolute or relative path to a directory. The directory
will be created if it does not exist. Data is written as JSON metadata files and
Parquet dataframes in a nested directory structure under root_dir.
Any additional keyword arguments are passed through to the underlying
fsspec.filesystem("file", ...) call via **storage_options.
Amazon S3¶
The S3 backend persists rubicon-ml data to a remote S3 bucket. It uses the same
persistence="filesystem" setting as the local backend — rubicon-ml detects
the s3:// prefix in root_dir and selects the S3 repository
automatically.
Install s3fs extra to use the S3 backend:
pip install s3fs
Then create a client pointing at your bucket:
from rubicon_ml import Rubicon
rubicon = Rubicon(
persistence="filesystem",
root_dir="s3://my-bucket/rubicon-root",
)
AWS credentials are resolved in the standard order: environment variables
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY), the shared credentials
file (~/.aws/credentials), or an instance profile. You can also pass
credentials explicitly via storage_options:
rubicon = Rubicon(
persistence="filesystem",
root_dir="s3://my-bucket/rubicon-root",
profile="my-aws-profile",
)
All extra keyword arguments are forwarded to s3fs.S3FileSystem.
In-Memory¶
The in-memory backend stores data in a virtual filesystem that lives entirely in the current process’s memory. It is intended for testing and development — data will not survive between Python sessions.
from rubicon_ml import Rubicon
rubicon = Rubicon(persistence="memory")
root_dir is only required if you are interacting with a previously-created
in-memory filesystem.
Weights & Biases¶
Warning
The W&B backend is experimental and may contain breaking changes in future versions. If you encounter any bugs or missing features, please open an issue.
The W&B backend maps rubicon-ml concepts onto native Weights & Biases primitives, so your experiment data is visible in both the rubicon-ml API and the W&B web UI.
Setup¶
Install the wandb package and authenticate:
pip install wandb
Set your API key as an environment variable:
export WANDB_API_KEY="your-api-key"
Alternatively, you can run wandb login which stores the key in
~/.netrc.
Basic usage¶
from rubicon_ml import Rubicon
rubicon = Rubicon(persistence="wandb")
project = rubicon.get_or_create_project("My Project")
experiment = project.log_experiment()
experiment.log_parameter("alpha", 0.1)
experiment.log_metric("accuracy", 0.95)
The root_dir parameter is unused by the W&B backend.
Configuration¶
You can pass additional configuration when creating the client:
entityThe W&B entity (username or team name) to read from and write to. When omitted, W&B uses the default entity from your local configuration.
wandb_init_kwargsA dictionary of additional keyword arguments forwarded to every
wandb.init()call (e.g.{"mode": "offline"}).
Pass these as storage_options:
rubicon = Rubicon(
persistence="wandb",
entity="my-team",
wandb_init_kwargs={"mode": "offline"},
)
Concept mapping¶
rubicon-ml objects map to W&B primitives as follows:
rubicon-ml |
Weights & Biases |
|---|---|
Project |
W&B Project |
Experiment |
W&B Run |
Metric (value) |
W&B Metric ( |
Parameter (value) |
W&B Config entry |
Feature (importance) |
W&B Metric ( |
Artifact |
W&B Artifact (type |
Dataframe |
W&B Artifact (type |
In addition to these native representations, every entity’s complete rubicon-ml
metadata is stored in the run’s W&B Config under a private _rubicon_* key so
that it can be fully reconstructed when reading back through rubicon-ml.
Limitations¶
The W&B backend does not currently support every operation available in the filesystem backends:
No project-level artifacts or dataframes. These must be logged to an experiment (W&B run).
No
get_projects()listing. Useget_project(name)orget_or_create_project(name)with a specific project name instead.
Transitioning from a filesystem backend¶
If you have been using a local or S3 filesystem backend and want to switch to
W&B, the change is straightforward — update the persistence argument:
# Before
rubicon = Rubicon(persistence="filesystem", root_dir="./rubicon-root")
# After
rubicon = Rubicon(persistence="wandb")
Because every backend exposes the same logging API, the rest of your code does not need to change. New experiments will be logged to W&B going forward.
Using Multiple Backends¶
rubicon-ml supports writing to multiple backends simultaneously via the
composite_config parameter. When a composite config is set, every write
operation fans out to all configured backends, while read operations return
from the first backend that succeeds.
from rubicon_ml import Rubicon
rubicon = Rubicon(
composite_config=[
{"persistence": "filesystem", "root_dir": "s3://my-bucket/rubicon-root"},
{"persistence": "wandb"},
],
)
project = rubicon.get_or_create_project("My Project")
experiment = project.log_experiment(name="dual-write")
experiment.log_metric("accuracy", 0.95)
# metric is now persisted to both S3 and W&B
This is useful when you want redundant storage or are migrating between backends and want a period of dual-writing.
For a complete walkthrough, see the multiple backend notebook.