API Reference

Complete API documentation for the Slingshot SDK.

Client

class slingshot.client.SlingshotClient(api_key: str | None = None, api_url: str | None = None)[source]

Bases: object

SlingshotClient is a client for interacting with the Slingshot API.

Get an API key from: https://slingshot.capitalone.com/configurations/api-keys

__init__(api_key: str | None = None, api_url: str | None = None)[source]

Initialize the Slingshot client.

Parameters:
  • api_key (str) – The API key for authentication. If not provided, it will look for the environment variable SLINGSHOT_API_KEY.

  • api_url (str) – The base URL for the Slingshot API. If not provided, it will look for the environment variable SLINGSHOT_API_URL, if not set, it will default to “https://slingshot.capitalone.com/prod/api/gradient”.

Raises:

ValueError – If the API key is not provided and not found in the environment.

Example

>>> from slingshot.client import SlingshotClient
>>> # Or:
>>> # from slingshot import SlingshotClient
>>> client = SlingshotClient(api_key="your_api_key")
__repr__()[source]

Return a string representation of the SlingshotClient.

property projects: ProjectAPI

Get the projects API client.

API Modules

Projects API

class slingshot.api.projects.ProjectAPI(client: SlingshotClient)[source]

Bases: object

API for managing projects in Slingshot.

__init__(client: SlingshotClient)[source]

Initialize the ProjectAPI.

create(name: str, workspace_id: str, description: str | None = Ellipsis, app_id: str | None = Ellipsis, job_id: str | None = Ellipsis, cluster_path: str | None = Ellipsis, settings: AssignSettingsSchema | None = Ellipsis) ProjectSchema[source]

Create a new Slingshot project for optimizing a Databricks job cluster.

Parameters:
  • name (str) – The name of the Slingshot project.

  • workspace_id (str) – The Databricks workspace ID where the job runs.

  • description (Optional[str], optional) – A description for the Slingshot project. Defaults to None.

  • app_id (Optional[str], optional) – The application ID, which must be unique across all active (not deleted) projects belonging to a Slingshot subscriber. This field can be used to search for a project with the get_projects() and iterate_projects() methods. The app_id is immutable once the project is created. Defaults to None.

  • job_id (Optional[str], optional) – The Databricks job ID that will be associated with this Slingshot project. Defaults to None.

  • cluster_path (Optional[str], optional) –

    The name of the Databricks job cluster to be optimized by this Slingshot project, prefixed with “job_clusters/” for a job cluster that is available to any task in the job; or the task name prefixed with “tasks/” for a task-specific cluster not available to other tasks in the job. For example, “job_clusters/my-cluster” or “tasks/task_1”. This field is required if the job has multiple compute clusters. If the job has only one compute cluster, this field is optional. Defaults to None.

    Each Slingshot project is linked to a single compute cluster in Databricks. If the cluster_path is not provided for a job that has multiple compute clusters, the Slingshot project will not be able to retrieve information about the job runs nor generate recommendations for optimizing the compute cluster.

    You can find the cluster name in the Databricks UI when viewing the configuration for a job cluster as the “Cluster name” field, or using the Databricks API, where it is called “job_cluster_key”.

    The task name is shown in the Databricks UI as the “Task name” field after selecting the task in the job configuration. In the Databricks API, it is called “task_key”.

    With the Databricks Python SDK, you can retrieve the cluster_path using the job_cluster_key or task_key from the job or task settings. For example, to get the Job object and extract the job_cluster_key or task_key, you can use the following code:

    >>> from databricks.sdk import WorkspaceClient
    >>> workspace_client = WorkspaceClient()
    >>> job = workspace_client.jobs.get(job_id=1234567890)
    

    If the job cluster is defined for the job and potentially shared across tasks in the job (which is the case for jobs created in the Databricks UI), you can retrieve the job_cluster_key like this:

    >>> cluster_name = job.settings.job_clusters[0].job_cluster_key
    >>> print(f'cluster_path="job_clusters/{cluster_name}"')
    

    Or, if the job cluster definition is tied to a specific task rather than shared across the entire job, you can first check whether the task is using a shared cluster, and if not, use the task_key as the cluster_path. When jobs are created with the Databricks API or SDK, tasks can be configured to use a new_cluster that is not shared with other tasks, in which case the job_cluster_key will not be set, and you should use the task_key instead:

    >>> if (cluster_name := job.settings.tasks[0].job_cluster_key):
    >>>     print(f'cluster_path="job_clusters/{cluster_name}"')
    >>> else:
    >>>     task_name = job.settings.tasks[0].task_key
    >>>     print(f'cluster_path="tasks/{task_name}"')
    

    See also:

  • settings (AssignSettingsSchema, optional) –

    A dictionary that sets Slingshot project options. Defaults to None.

    • sla_minutes (Optional[int], optional): The acceptable time (in minutes) for the job to complete.

      The SLA (Service Level Agreement) is the maximum time the job should take to complete. Slingshot uses this value as an expected upper bound when optimizing the job for lowest cost. Defaults to None.

    • auto_apply_recs (Optional[bool], optional): Automatically apply recommendations.

      Defaults to False.

Returns:

The details of the newly created project.

Return type:

ProjectSchema

update(project_id: str, name: str | None = Ellipsis, workspace_id: str | None = Ellipsis, description: str | None = Ellipsis, job_id: str | None = Ellipsis, cluster_path: str | None = Ellipsis, settings: AssignSettingsSchema | None = Ellipsis) ProjectSchema[source]

Update the attributes of an existing Slingshot project.

Only those attributes that are provided will be updated. Attributes set to None will overwrite the project attribute with None.

Parameters:
  • project_id (str) – The ID of the Slingshot project to update.

  • name (Optional[str], optional) – The new name for the Slingshot project.

  • workspace_id (Optional[str], optional) –

    The new Databricks workspace ID where the job runs.

    Note: If you are changing the Databricks workspace associated with the Slingshot project, you probably also want to reset the project using the reset() method. This will remove all previous job run data from the project, allowing Slingshot to re-optimize the job without the influence of previous runs.

  • description (Optional[str], optional) – The new description for the Slingshot project.

  • job_id (Optional[str], optional) –

    The new Databricks job ID that will be associated with this Slingshot project.

    Note: If you are changing the Databricks job associated with the Slingshot project, you probably also want to reset the project using the reset() method. This will remove all previous job run data from the project, allowing Slingshot to re-optimize the job without the influence of previous runs.

  • cluster_path (Optional[str], optional) –

    The name of the Databricks job cluster to be optimized by this Slingshot project, prefixed with “job_clusters/” for a job cluster that is available to any task in the job; or the task name prefixed with “tasks/” for a task-specific cluster not available to other tasks in the job. For example, “job_clusters/my-cluster” or “tasks/task_1”. This field is required if the job has multiple compute clusters. If the job has only one compute cluster, this field is optional.

    Each Slingshot project is linked to a single compute cluster in Databricks. If the cluster_path is not provided for a job that has multiple compute clusters, the Slingshot project will not be able to retrieve information about the job runs nor generate recommendations for optimizing the compute cluster.

    You can find the cluster name in the Databricks UI when viewing the configuration for a job cluster as the “Cluster name” field, or using the Databricks API, where it is called “job_cluster_key”.

    The task name is shown in the Databricks UI as the “Task name” field after selecting the task in the job configuration. In the Databricks API, it is called “task_key”.

    With the Databricks Python SDK, you can retrieve the cluster_path using the job_cluster_key or task_key from the job or task settings. For example, to get the Job object and extract the job_cluster_key or task_key, you can use the following code:

    >>> from databricks.sdk import WorkspaceClient
    >>> workspace_client = WorkspaceClient()
    >>> job = workspace_client.jobs.get(job_id=1234567890)
    

    If the job cluster is defined for the job and potentially shared across tasks in the job (which is the case for jobs created in the Databricks UI), you can retrieve the job_cluster_key like this:

    >>> cluster_name = job.settings.job_clusters[0].job_cluster_key
    >>> print(f'cluster_path="job_clusters/{cluster_name}"')
    

    Or, if the job cluster definition is tied to a specific task rather than shared across the entire job, you can first check whether the task is using a shared cluster, and if not, use the task_key as the cluster_path. When jobs are created with the Databricks API or SDK, tasks can be configured to use a new_cluster that is not shared with other tasks, in which case the job_cluster_key will not be set, and you should use the task_key instead:

    >>> if (cluster_name := job.settings.tasks[0].job_cluster_key):
    >>>     print(f'cluster_path="job_clusters/{cluster_name}"')
    >>> else:
    >>>     task_name = job.settings.tasks[0].task_key
    >>>     print(f'cluster_path="tasks/{task_name}"')
    

    See also:

  • settings (AssignSettingsSchema, optional) –

    A dictionary with updates to the options for the Slingshot project. The options are:

    • sla_minutes (Optional[int], optional): The acceptable time (in minutes) for the job to complete.

      The SLA (Service Level Agreement) is the maximum time the job should take to complete. Slingshot uses this value as an expected upper bound when optimizing the job for lowest cost.

    • auto_apply_recs (Optional[bool], optional): Automatically apply recommendations.

Returns:

The details of the updated project.

Return type:

ProjectSchema

delete(project_id: str) None[source]

Delete a Slingshot project by its ID.

This method removes the Slingshot project but does not affect the Databricks job that was associated with the project.

Parameters:

project_id (str) – The ID of the Slingshot project to delete.

Returns:

None

reset(project_id: str) None[source]

Reset a Slingshot project by its ID, removing all previous job run data from the project.

Use this method to clear all previous job run data and start fresh with the same project. It is useful when a job changes significantly and you want to re-optimize it without the influence of previous runs, since Slingshot uses historical run data to optimize the job.

This does not affect the Databricks job associated with the project; run history will still be accessible from the Databricks platform.

Parameters:

project_id (str) – The ID of the Slingshot project to reset.

Returns:

None

get_projects(include: list[str] | None = None, creator_id: str | None = None, app_id: str | None = None, job_id: str | None = None, page: int = 1, size: int = 50) Page[ProjectSchema][source]

Retrieve a paginated list of projects based on filter criteria.

Parameters:
  • include (Optional[list[str]]) – Attributes within ProjectSchema to include in the response. If not provided, all available attributes are included. Defaults to None.

  • creator_id (Optional[str], optional) – The ID of the project creator to filter projects by. Defaults to None.

  • app_id (Optional[str], optional) – The application ID to filter projects by. This is an identifier that is unique across all projects for a Slingshot subscriber and is set at the time a project is created. Defaults to None.

  • job_id (Optional[str], optional) – The Databricks job ID to filter projects by. Defaults to None.

  • page (int, optional) – The page number to retrieve. Defaults to 1.

  • size (int, optional) – The number of projects to retrieve per page. Defaults to 50.

Returns:

A list of project details for the requested page.

Return type:

Page[ProjectSchema]

iterate_projects(include: list[str] | None = None, creator_id: str | None = None, app_id: str | None = None, job_id: str | None = None, size: int = 50, max_pages: int = 1000) Iterator[ProjectSchema][source]

Fetch all projects page by page using a memory-efficient generator.

Parameters:
  • include (Optional[list[str]]) – Attributes within ProjectSchema to include in the response. If not provided, all available attributes are included. Defaults to None.

  • creator_id (Optional[str], optional) – The ID of the project creator to filter projects by. Defaults to None.

  • app_id (Optional[str], optional) – The application ID to filter projects by. This is an identifier that is unique across all projects for a Slingshot subscriber and is set at the time a project is created. Defaults to None.

  • job_id (Optional[str], optional) – The Databricks job ID to filter projects by. Defaults to None.

  • size (int, optional) – The number of projects to retrieve per page. Defaults to 50.

  • max_pages (int, optional) – The maximum number of pages allowed to traverse. Defaults to 1000.

Yields:

Iterator[ProjectSchema] – A project object, one at a time.

get_project(project_id: str, include: list[str] | None = None) ProjectSchema[source]

Fetch a project by its ID.

Parameters:
  • project_id (str) – The ID of the project to fetch.

  • include (Optional[list[str]]) – Attributes within ProjectSchema to include in the response. If not provided, all available attributes are included. Defaults to None.

Returns:

The project details.

Return type:

ProjectSchema

create_recommendation(project_id: str) RecommendationDetailsSchema[source]

Create a new recommendation for a Slingshot project.

Recommendations are suggested changes to Databricks job cluster configurations meant to minimize costs while keeping job run time within required SLAs. They are generated based on the previous job runs associated with the Slingshot project.

A recommendation can be created for a project once Slingshot has received details about a successful job run associated with that project. Slingshot will begin checking for job runs after a project is linked to a Databricks job (or a cluster within that job).

The recommendation will be in a “PENDING” state immediately after creation, meaning it is still being processed. It can be applied if its state is “PENDING”, “UPLOADING”, or “SUCCESS” (but not “FAILURE”).

Note

The returned value, a dictionary with info about the recommendation, lacks the full details of the recommendation because the state is still “PENDING” immediately after the recommendation is created. Use the method get_recommendation() to retrieve the full details, like this:

>>> from slingshot import SlingshotClient
>>> client = SlingshotClient()
>>> project_id = "your_project_id"
>>> # Create a recommendation
>>> recommendation = client.projects.create_recommendation(project_id)
>>> # Get the recommendation details
>>> recommendation_details = client.projects.get_recommendation(
>>>     project_id=project_id, recommendation_id=recommendation["id"]
>>> )
Parameters:

project_id (str) – The ID of the project to create a recommendation for.

Returns:

A dictionary with details about the recommendation that was created. The recommendation will have a “PENDING” state, meaning it is still being processed. To get the full details of the recommendation, use the get_recommendation() method with the recommendation ID returned in the response.

Return type:

RecommendationDetailsSchema

get_recommendation(project_id: str, recommendation_id: str) RecommendationDetailsSchema[source]

Fetch a specific recommendation for a Slingshot project.

Recommendations are suggested changes to Databricks job cluster configurations meant to minimize costs while keeping job run time within required SLAs. They are generated based on the previous job runs associated with the Slingshot project.

Parameters:
  • project_id (str) – The ID of the project that the recommendation belongs to.

  • recommendation_id (str) – The ID of the recommendation to fetch.

Returns:

A dictionary with details of the recommendation.

Return type:

RecommendationDetailsSchema

apply_recommendation(project_id: str, recommendation_id: str) RecommendationDetailsSchema[source]

Apply a recommendation to the Slingshot project.

The recommendation is applied to the Databricks job cluster associated with the Slingshot project.

Recommendations are suggested changes to Databricks job cluster configurations meant to minimize costs while keeping job run time within required SLAs. They are generated based on the previous job runs linked to the Slingshot project.

A recommendation can be applied if its state is “SUCCESS”, “PENDING”, or “UPLOADING”. If the recommendation is in a “FAILURE” state, applying it will raise an error.

Parameters:
  • project_id (str) – The ID of the project that the recommendation belongs to.

  • recommendation_id (str) – The ID of the recommendation to fetch.

Returns:

A dictionary with details of the recommendation that was applied.

Return type:

RecommendationDetailsSchema

API Schema Types

These are the data types returned by the API methods that match the Slingshot API schema.

class slingshot.types.ProjectSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for a project in Slingshot.

created_at: str | None
updated_at: str | None
id: str | None
name: str | None
app_id: str | None
cluster_path: str | None
job_id: str | None
workspace_id: str | None
creator_id: str | None
description: str | None
settings: ProjectSettingsSchema | None
metrics: ProjectMetricsSchema | None
creator: ProjectCreatorSchema | None
phase: str | None
product_name: str | None
class slingshot.types.ProjectSettingsSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for retrieving the project additional settings in Slingshot.

sla_minutes: int | None
auto_apply_recs: bool | None
class slingshot.types.AssignSettingsSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for assigning additional project settings in Slingshot.

sla_minutes: typing_extensions.NotRequired[int | None]
auto_apply_recs: typing_extensions.NotRequired[bool | None]
class slingshot.types.ProjectMetricsSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for retrieving the project metrics in Slingshot.

job_success_rate_percent: int | None
sla_met_percent: int | None
estimated_savings: int | None
class slingshot.types.ProjectCreatorSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for retrieving the project creator in Slingshot.

userId: str | None
auth0Id: str | None
tenantId: str | None
isTenantAdmin: bool | None
firstName: str | None
lastName: str | None
email: str | None
createdAt: str | None
updatedAt: str | None
isActive: bool | None
isRegistered: bool | None
class slingshot.types.RecommendationDetailsSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for retrieving the details of recommendation to a project in Slingshot.

created_at: str | None
updated_at: str | None
id: str | None
state: str | None
error: str | None
recommendation: RecommendationSchema | None
class slingshot.types.RecommendationSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the recommendation of a project in Slingshot.

metrics: MetricsSchema | None
configuration: ConfigurationSchema | None
settings: SettingsSchema | None
class slingshot.types.MetricsSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the recommended metrics in Slingshot.

spark_duration_minutes: int | None
spark_cost_requested_usd: int | None
class slingshot.types.ConfigurationSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the recommended configuration in Slingshot.

enable_elastic_disk: bool | None
node_type_id: str | None
num_workers: int | None
autoscale: AutoscaleSchema | None
aws_attributes: AwsAttributesSchema | None
azure_attributes: AzureAttributesSchema | None
cluster_log_conf: ClusterLogConfSchema | None
default_tags: dict[str, str] | None
driver_node_type_id: str | None
spec: SpecSchema | None
class slingshot.types.AutoscaleSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the autoscale configuration in a recommendation.

max_workers: int | None
min_workers: int | None
class slingshot.types.AwsAttributesSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the AWS attributes in a recommendation.

availability: str | None
ebs_volume_count: int | None
ebs_volume_iops: int | None
ebs_volume_size: int | None
ebs_volume_throughput: int | None
ebs_volume_type: str | None
first_on_demand: int | None
spot_bid_price_percent: int | None
class slingshot.types.AzureAttributesSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the Azure attributes in a recommendation.

availability: str | None
first_on_demand: int | None
spot_bid_max_price: int | None
class slingshot.types.ClusterLogConfSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the cluster log configuration in a recommendation.

dbfs: DbfsLogConfSchema | None
s3: S3LogConfSchema | None
volumes: VolumesLogConfSchema | None
class slingshot.types.DbfsLogConfSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for DBFS log configuration in a cluster log configuration.

destination: str | None
class slingshot.types.S3LogConfSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for S3 log configuration in a cluster log configuration.

destination: str | None
canned_acl: str | None
enable_encryption: bool | None
encryption_type: str | None
endpoint: str | None
kms_key: str | None
region: str | None
class slingshot.types.VolumesLogConfSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for volume log configuration in a cluster log configuration.

destination: str | None
class slingshot.types.SpecSchema(_typename, _fields=None, /, **kwargs)[source]

Bases: dict

Schema for the spec in a recommendation.

enable_elastic_disk: bool | None
node_type_id: str | None
num_workers: int | None
driver_node_type_id: str | None
class slingshot.types.Page(_typename, _fields=None, /, **kwargs)[source]

Bases: TypedDict, Generic[T]

A page of items from a paginated collection.

page: int
pages: int