Available Backend Repositories¶

rubicon-ml uses pluggable storage backends powered by fsspec. The backend is selected via the persistence and root_dir keyword arguments when instantiating the Rubicon client. All backends expose the same logging API, so switching between them requires only changing how you create the client — the rest of your code stays the same.

Backend summary¶
Backend	`persistence`	`root_dir`	Install
Local filesystem	`"filesystem"`	`"/path/to/dir"`	(included)
Amazon S3	`"filesystem"`	`"s3://bucket/prefix"`	`pip install s3fs`
In-memory	`"memory"`	(optional)	(included)
Weights & Biases	`"wandb"`	(unused)	`pip install wandb`

Local Filesystem¶

The local filesystem backend persists rubicon-ml data to a directory on the machine where your code is running. It is the default backend and requires no extra dependencies.

from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="filesystem", root_dir="/path/to/rubicon-root")

root_dir must be an absolute or relative path to a directory. The directory will be created if it does not exist. Data is written as JSON metadata files and Parquet dataframes in a nested directory structure under root_dir.

Any additional keyword arguments are passed through to the underlying fsspec.filesystem("file", ...) call via **storage_options.

Amazon S3¶

The S3 backend persists rubicon-ml data to a remote S3 bucket. It uses the same persistence="filesystem" setting as the local backend — rubicon-ml detects the s3:// prefix in root_dir and selects the S3 repository automatically.

Install s3fs extra to use the S3 backend:

pip install s3fs

Then create a client pointing at your bucket:

from rubicon_ml import Rubicon

rubicon = Rubicon(
    persistence="filesystem",
    root_dir="s3://my-bucket/rubicon-root",
)

AWS credentials are resolved in the standard order: environment variables (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY), the shared credentials file (~/.aws/credentials), or an instance profile. You can also pass credentials explicitly via storage_options:

rubicon = Rubicon(
    persistence="filesystem",
    root_dir="s3://my-bucket/rubicon-root",
    profile="my-aws-profile",
)

All extra keyword arguments are forwarded to s3fs.S3FileSystem.

In-Memory¶

The in-memory backend stores data in a virtual filesystem that lives entirely in the current process’s memory. It is intended for testing and development — data will not survive between Python sessions.

from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="memory")

root_dir is only required if you are interacting with a previously-created in-memory filesystem.

Weights & Biases¶

Warning

The W&B backend is experimental and may contain breaking changes in future versions. If you encounter any bugs or missing features, please open an issue.

The W&B backend maps rubicon-ml concepts onto native Weights & Biases primitives, so your experiment data is visible in both the rubicon-ml API and the W&B web UI.

Setup¶

Install the wandb package and authenticate:

pip install wandb

Set your API key as an environment variable:

export WANDB_API_KEY="your-api-key"

Alternatively, you can run wandb login which stores the key in ~/.netrc.

Basic usage¶

from rubicon_ml import Rubicon

rubicon = Rubicon(persistence="wandb")

project = rubicon.get_or_create_project("My Project")
experiment = project.log_experiment()

experiment.log_parameter("alpha", 0.1)
experiment.log_metric("accuracy", 0.95)

The root_dir parameter is unused by the W&B backend.

Configuration¶

You can pass additional configuration when creating the client:

entity: The W&B entity (username or team name) to read from and write to. When omitted, W&B uses the default entity from your local configuration.
wandb_init_kwargs: A dictionary of additional keyword arguments forwarded to every wandb.init() call (e.g. {"mode": "offline"}).

Pass these as storage_options:

rubicon = Rubicon(
    persistence="wandb",
    entity="my-team",
    wandb_init_kwargs={"mode": "offline"},
)

Concept mapping¶

rubicon-ml objects map to W&B primitives as follows:

rubicon-ml	Weights & Biases
Project	W&B Project
Experiment	W&B Run
Metric (value)	W&B Metric (`wandb.log`)
Parameter (value)	W&B Config entry
Feature (importance)	W&B Metric (`wandb.log`)
Artifact	W&B Artifact (type `"model"`)
Dataframe	W&B Artifact (type `"dataset"`) + W&B Table

In addition to these native representations, every entity’s complete rubicon-ml metadata is stored in the run’s W&B Config under a private _rubicon_* key so that it can be fully reconstructed when reading back through rubicon-ml.

Limitations¶

The W&B backend does not currently support every operation available in the filesystem backends:

No project-level artifacts or dataframes. These must be logged to an experiment (W&B run).
No get_projects() listing. Use get_project(name) or get_or_create_project(name) with a specific project name instead.

Transitioning from a filesystem backend¶

If you have been using a local or S3 filesystem backend and want to switch to W&B, the change is straightforward — update the persistence argument:

# Before
rubicon = Rubicon(persistence="filesystem", root_dir="./rubicon-root")

# After
rubicon = Rubicon(persistence="wandb")

Because every backend exposes the same logging API, the rest of your code does not need to change. New experiments will be logged to W&B going forward.

Using Multiple Backends¶

rubicon-ml supports writing to multiple backends simultaneously via the composite_config parameter. When a composite config is set, every write operation fans out to all configured backends, while read operations return from the first backend that succeeds.

from rubicon_ml import Rubicon

rubicon = Rubicon(
    composite_config=[
        {"persistence": "filesystem", "root_dir": "s3://my-bucket/rubicon-root"},
        {"persistence": "wandb"},
    ],
)

project = rubicon.get_or_create_project("My Project")
experiment = project.log_experiment(name="dual-write")
experiment.log_metric("accuracy", 0.95)
# metric is now persisted to both S3 and W&B

This is useful when you want redundant storage or are migrating between backends and want a period of dual-writing.

For a complete walkthrough, see the multiple backend notebook.