.. _backend-repositories: Available Backend Repositories ****************************** rubicon-ml uses pluggable storage backends powered by fsspec_. The backend is selected via the ``persistence`` and ``root_dir`` keyword arguments when instantiating the :class:`~rubicon_ml.Rubicon` client. All backends expose the same logging API, so switching between them requires only changing how you create the client — the rest of your code stays the same. .. list-table:: Backend summary :header-rows: 1 :widths: 20 20 25 35 * - Backend - ``persistence`` - ``root_dir`` - Install * - Local filesystem - ``"filesystem"`` - ``"/path/to/dir"`` - *(included)* * - Amazon S3 - ``"filesystem"`` - ``"s3://bucket/prefix"`` - ``pip install s3fs`` * - In-memory - ``"memory"`` - *(optional)* - *(included)* * - Weights & Biases - ``"wandb"`` - *(unused)* - ``pip install wandb`` Local Filesystem ================ The local filesystem backend persists rubicon-ml data to a directory on the machine where your code is running. It is the default backend and requires no extra dependencies. .. code-block:: python from rubicon_ml import Rubicon rubicon = Rubicon(persistence="filesystem", root_dir="/path/to/rubicon-root") ``root_dir`` must be an absolute or relative path to a directory. The directory will be created if it does not exist. Data is written as JSON metadata files and Parquet dataframes in a nested directory structure under ``root_dir``. Any additional keyword arguments are passed through to the underlying ``fsspec.filesystem("file", ...)`` call via ``**storage_options``. Amazon S3 ========= The S3 backend persists rubicon-ml data to a remote S3 bucket. It uses the same ``persistence="filesystem"`` setting as the local backend — rubicon-ml detects the ``s3://`` prefix in ``root_dir`` and selects the S3 repository automatically. Install ``s3fs`` extra to use the S3 backend: .. code-block:: console pip install s3fs Then create a client pointing at your bucket: .. code-block:: python from rubicon_ml import Rubicon rubicon = Rubicon( persistence="filesystem", root_dir="s3://my-bucket/rubicon-root", ) AWS credentials are resolved in the standard order: environment variables (``AWS_ACCESS_KEY_ID`` / ``AWS_SECRET_ACCESS_KEY``), the shared credentials file (``~/.aws/credentials``), or an instance profile. You can also pass credentials explicitly via ``storage_options``: .. code-block:: python rubicon = Rubicon( persistence="filesystem", root_dir="s3://my-bucket/rubicon-root", profile="my-aws-profile", ) All extra keyword arguments are forwarded to ``s3fs.S3FileSystem``. In-Memory ========= The in-memory backend stores data in a virtual filesystem that lives entirely in the current process's memory. It is intended for **testing and development** — data will not survive between Python sessions. .. code-block:: python from rubicon_ml import Rubicon rubicon = Rubicon(persistence="memory") ``root_dir`` is only required if you are interacting with a previously-created in-memory filesystem. .. _wandb-backend: Weights & Biases ================ .. warning:: The W&B backend is **experimental** and may contain breaking changes in future versions. If you encounter any bugs or missing features, please open an issue. The W&B backend maps rubicon-ml concepts onto native Weights & Biases primitives, so your experiment data is visible in both the rubicon-ml API and the W&B web UI. Setup ----- Install the ``wandb`` package and authenticate: .. code-block:: console pip install wandb Set your API key as an environment variable: .. code-block:: console export WANDB_API_KEY="your-api-key" Alternatively, you can run ``wandb login`` which stores the key in ``~/.netrc``. Basic usage ----------- .. code-block:: python from rubicon_ml import Rubicon rubicon = Rubicon(persistence="wandb") project = rubicon.get_or_create_project("My Project") experiment = project.log_experiment() experiment.log_parameter("alpha", 0.1) experiment.log_metric("accuracy", 0.95) The ``root_dir`` parameter is unused by the W&B backend. Configuration ------------- You can pass additional configuration when creating the client: ``entity`` The W&B entity (username or team name) to read from and write to. When omitted, W&B uses the default entity from your local configuration. ``wandb_init_kwargs`` A dictionary of additional keyword arguments forwarded to every ``wandb.init()`` call (e.g. ``{"mode": "offline"}``). Pass these as ``storage_options``: .. code-block:: python rubicon = Rubicon( persistence="wandb", entity="my-team", wandb_init_kwargs={"mode": "offline"}, ) Concept mapping --------------- rubicon-ml objects map to W&B primitives as follows: .. list-table:: :header-rows: 1 :widths: 35 65 * - rubicon-ml - Weights & Biases * - Project - W&B Project * - Experiment - W&B Run * - Metric (value) - W&B Metric (``wandb.log``) * - Parameter (value) - W&B Config entry * - Feature (importance) - W&B Metric (``wandb.log``) * - Artifact - W&B Artifact (type ``"model"``) * - Dataframe - W&B Artifact (type ``"dataset"``) + W&B Table In addition to these native representations, every entity's complete rubicon-ml metadata is stored in the run's W&B Config under a private ``_rubicon_*`` key so that it can be fully reconstructed when reading back through rubicon-ml. Limitations ----------- The W&B backend does not currently support every operation available in the filesystem backends: - **No project-level artifacts or dataframes.** These must be logged to an experiment (W&B run). - **No** ``get_projects()`` **listing.** Use ``get_project(name)`` or ``get_or_create_project(name)`` with a specific project name instead. Transitioning from a filesystem backend ---------------------------------------- If you have been using a local or S3 filesystem backend and want to switch to W&B, the change is straightforward — update the ``persistence`` argument: .. code-block:: python # Before rubicon = Rubicon(persistence="filesystem", root_dir="./rubicon-root") # After rubicon = Rubicon(persistence="wandb") Because every backend exposes the same logging API, the rest of your code does not need to change. New experiments will be logged to W&B going forward. Using Multiple Backends ======================= rubicon-ml supports writing to multiple backends simultaneously via the ``composite_config`` parameter. When a composite config is set, **every write operation fans out to all configured backends**, while **read operations return from the first backend that succeeds**. .. code-block:: python from rubicon_ml import Rubicon rubicon = Rubicon( composite_config=[ {"persistence": "filesystem", "root_dir": "s3://my-bucket/rubicon-root"}, {"persistence": "wandb"}, ], ) project = rubicon.get_or_create_project("My Project") experiment = project.log_experiment(name="dual-write") experiment.log_metric("accuracy", 0.95) # metric is now persisted to both S3 and W&B This is useful when you want redundant storage or are migrating between backends and want a period of dual-writing. For a complete walkthrough, see the :doc:`multiple backend notebook `. .. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/