aistore_restapi/docs/python_sdk.md at main · ojhaanshika/aistore_restapi

layout

title

permalink

redirect_from

post

PYTHON SDK

/docs/python-sdk

/python_sdk.md/

/docs/python_sdk.md/

AIStore Python SDK is a growing set of client-side objects and methods to access and utilize AIS clusters. This document contains API documentation for the AIStore Python SDK.

For our PyTorch integration, please refer to the PyTorch Docs. For more information, please refer to AIS Python SDK available via Python Package Index (PyPI) or see https://github.com/NVIDIA/aistore/tree/main/python/aistore.

authn.authn_client
- AuthNClient
authn.cluster_manager
- ClusterManager
  - client
  - list
  - get
  - register
  - update
  - delete
authn.role_manager
- RoleManager
  - client
  - list
  - get
  - create
  - update
  - delete
authn.token_manager
- TokenManager
  - client
  - revoke
authn.user_manager
- UserManager
  - client
  - get
  - delete
  - create
  - list
  - update
authn.access_attr
- AccessAttr
  - describe
bucket
- Bucket
  - client
  - client
  - qparam
  - provider
  - name
  - namespace
  - list_urls
  - list_all_objects_iter
  - create
  - delete
  - rename
  - evict
  - head
  - summary
  - info
  - copy
  - list_objects
  - list_objects_iter
  - list_all_objects
  - list_archive
  - transform
  - put_files
  - object
  - objects
  - make_request
  - verify_cloud_bucket
  - get_path
  - as_model
  - write_dataset
client
- Client
  - bucket
  - cluster
  - job
  - etl
  - dsort
  - batch_loader
  - fetch_object_by_url
  - get_object_from_url
cluster
- Cluster
etl
job
- Job
retry_config
- ColdGetConf
  - default
- RetryConfig
  - default
multiobj.object_group
- ObjectGroup
  - client
  - client
  - list_urls
  - list_all_objects_iter
  - delete
  - evict
  - prefetch
  - copy
  - transform
  - archive
  - list_names
multiobj.object_names
- ObjectNames
multiobj.object_range
- ObjectRange
  - from_string
multiobj.object_template
- ObjectTemplate
obj.object
- BucketDetails
- Object
  - bucket_name
  - bucket_provider
  - query_params
  - name
  - uname
  - props
  - props_cached
  - head
  - get_reader
  - get
  - get_semantic_url
  - get_url
  - put_content
  - put_file
  - get_writer
  - promote
  - delete
  - copy
  - blob_download
  - append_content
  - set_custom_props
obj.object_reader
- ObjectReader
  - head
  - attributes
  - read_all
  - raw
  - as_file
  - __iter__
obj.obj_file.object_file
- ObjectFileReader
  - readable
  - read
  - close
- ObjectFileWriter
  - write
  - flush
  - close
obj.object_props
- ObjectProps
obj.object_attributes
- ObjectAttributes

Class: AuthNClient

class AuthNClient()

AuthN client for managing authentication.

This client provides methods to interact with AuthN Server. For more info on AuthN Server, see https://github.com/NVIDIA/aistore/blob/main/docs/authn.md

Arguments:

endpoint str - AuthN service endpoint URL.
skip_verify bool, optional - If True, skip SSL certificate verification. Defaults to False.
ca_cert str, optional - Path to a CA certificate file for SSL verification.
timeout Union[float, Tuple[float, float], None], optional - Request timeout in seconds; a single float for both connect/read timeouts (e.g., 5.0), a tuple for separate connect/read timeouts (e.g., (3.0, 10.0)), or None to disable timeout.
retry urllib3.Retry, optional - Retry configuration object from the urllib3 library.
token str, optional - Authorization token.

client

@property
def client() -> RequestClient

Get the request client.

Returns:

RequestClient - The client this AuthN client uses to make requests.

login

def login(username: str,
          password: str,
          expires_in: Optional[Union[int, float]] = None) -> str

Logs in to the AuthN Server and returns an authorization token.

Arguments:

username str - The username to log in with.
password str - The password to log in with.
expires_in Optional[Union[int, float]] - The expiration duration of the token in seconds.

Returns:

str - An authorization token to use for future requests.

Raises:

ValueError - If the password is empty or consists only of spaces.
AISError - If the login request fails.

logout

def logout() -> None

Logs out and revokes current token from the AuthN Server.

Raises:

AISError - If the logout request fails.

cluster_manager

def cluster_manager() -> ClusterManager

Factory method to create a ClusterManager instance.

Returns:

ClusterManager - An instance to manage cluster operations.

role_manager

def role_manager() -> RoleManager

Factory method to create a RoleManager instance.

Returns:

RoleManager - An instance to manage role operations.

user_manager

def user_manager() -> UserManager

Factory method to create a UserManager instance.

Returns:

UserManager - An instance to manage user operations.

token_manager

def token_manager() -> TokenManager

Factory method to create a TokenManager instance.

Returns:

TokenManager - An instance to manage token operations.

Class: ClusterManager

class ClusterManager()

ClusterManager class for handling operations on clusters within the context of authentication.

This class provides methods to list, get, register, update, and delete clusters on AuthN server.

Arguments:

client RequestClient - The request client to make HTTP requests.

client

@property
def client() -> RequestClient

RequestClient: The client this cluster manager uses to make requests.

list

def list() -> ClusterList

Retrieve all clusters.

Returns:

ClusterList - A list of all clusters.

Raises:

AISError - If an error occurs while listing clusters.

get

def get(cluster_id: Optional[str] = None,
        cluster_alias: Optional[str] = None) -> ClusterInfo

Retrieve a specific cluster by ID or alias.

Arguments:

cluster_id Optional[str] - The ID of the cluster. Defaults to None.
cluster_alias Optional[str] - The alias of the cluster. Defaults to None.

Returns:

ClusterInfo - Information about the specified cluster.

Raises:

ValueError - If neither cluster_id nor cluster_alias is provided.
RuntimeError - If no cluster matches the provided ID or alias.
AISError - If an error occurs while getting the cluster.

register

def register(cluster_alias: str, urls: List[str]) -> ClusterInfo

Arguments:

cluster_alias str - The alias for the new cluster.
urls List[str] - A list of URLs for the new cluster.

Returns:

ClusterInfo - Information about the registered cluster.

Raises:

ValueError - If no URLs are provided or an invalid URL is provided.
AISError - If an error occurs while registering the cluster.

update

def update(cluster_id: str,
           cluster_alias: Optional[str] = None,
           urls: Optional[List[str]] = None) -> ClusterInfo

Update an existing cluster.

Arguments:

cluster_id str - The ID of the cluster to update.
cluster_alias Optional[str] - The new alias for the cluster. Defaults to None.
urls Optional[List[str]] - The new list of URLs for the cluster. Defaults to None.

Returns:

ClusterInfo - Information about the updated cluster.

Raises:

ValueError - If neither cluster_alias nor urls are provided.
AISError - If an error occurs while updating the cluster

delete

def delete(cluster_id: Optional[str] = None,
           cluster_alias: Optional[str] = None)

Delete a specific cluster by ID or alias.

Arguments:

cluster_id Optional[str] - The ID of the cluster to delete. Defaults to None.
cluster_alias Optional[str] - The alias of the cluster to delete. Defaults to None.

Raises:

ValueError - If neither cluster_id nor cluster_alias is provided.
AISError - If an error occurs while deleting the cluster

Class: RoleManager

class RoleManager()

Manages role-related operations.

This class provides methods to interact with roles, including retrieving, creating, updating, and deleting role information.

Arguments:

client RequestClient - The RequestClient used to make HTTP requests.

client

@property
def client() -> RequestClient

Returns the RequestClient instance used by this RoleManager.

list

def list() -> RolesList

Retrieves information about all roles.

Returns:

RoleList - A list containing information about all roles.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore.
requests.RequestException - If the HTTP request fails.

get

def get(role_name: str) -> RoleInfo

Retrieves information about a specific role.

Arguments:

role_name str - The name of the role to retrieve.

Returns:

RoleInfo - Information about the specified role.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore.
requests.RequestException - If the HTTP request fails.

create

def create(name: str,
           desc: str,
           cluster_alias: str,
           perms: List[AccessAttr],
           bucket_name: str = None) -> RoleInfo

Creates a new role.

Arguments:

name str - The name of the role.
desc str - A description of the role.
cluster_alias str - The alias of the cluster this role will have access to.
perms List[AccessAttr] - A list of permissions to be granted for this role.
bucket_name str, optional - The name of the bucket this role will have access to.

Returns:

RoleInfo - Information about the newly created role.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore.
requests.RequestException - If the HTTP request fails.

update

def update(name: str,
           desc: str = None,
           cluster_alias: str = None,
           perms: List[AccessAttr] = None,
           bucket_name: str = None) -> RoleInfo

Updates an existing role.

Arguments:

name str - The name of the role.
desc str, optional - An updated description of the role.
cluster_alias str, optional - The alias of the cluster this role will have access to.
perms List[AccessAttr], optional - A list of updated permissions to be granted for this role.
bucket_name str, optional - The name of the bucket this role will have access to.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore.
requests.RequestException - If the HTTP request fails.
ValueError - If the role does not exist or if invalid parameters are provided.

delete

def delete(name: str, missing_ok: bool = False) -> None

Deletes a role.

Arguments:

name str - The name of the role to delete.
missing_ok bool - Ignore error if role does not exist. Defaults to False

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore.
requests.RequestException - If the HTTP request fails.
ValueError - If the role does not exist.

Class: TokenManager

class TokenManager()

Manages token-related operations.

This class provides methods to interact with tokens in the AuthN server. .

Arguments:

client RequestClient - The RequestClient used to make HTTP requests.

client

@property
def client() -> RequestClient

Returns the RequestClient instance used by this TokenManager.

revoke

def revoke(token: str) -> None

Revokes the specified authentication token.

Arguments:

token str - The token to be revoked.

Raises:

ValueError - If the token is not provided.
AISError - If the revoke token request fails.

Class: UserManager

class UserManager()

UserManager provides methods to manage users in the AuthN service.

Arguments:

client RequestClient - The RequestClient used to make HTTP requests.

client

@property
def client() -> RequestClient

Returns the RequestClient instance used by this UserManager.

get

def get(username: str) -> UserInfo

Retrieve user information from the AuthN Server.

Arguments:

username str - The username to retrieve.

Returns:

UserInfo - The user's information.

Raises:

AISError - If the user retrieval request fails.

delete

def delete(username: str, missing_ok: bool = False) -> None

Delete an existing user from the AuthN Server.

Arguments:

username str - The username of the user to delete.
missing_ok bool - Ignore error if user does not exist. Defaults to False.

Raises:

AISError - If the user deletion request fails.

create

def create(username: str, roles: List[str], password: str) -> UserInfo

Create a new user in the AuthN Server.

Arguments:

username str - The name or ID of the user to create.
password str - The password for the user.
roles List[str] - The list of names of roles to assign to the user.

Returns:

UserInfo - The created user's information.

Raises:

AISError - If the user creation request fails.

list

def list()

List all users in the AuthN Server.

Returns:

str - The list of users in the AuthN Server.

Raises:

AISError - If the user list request fails.

update

def update(username: str,
           password: Optional[str] = None,
           roles: Optional[List[str]] = None) -> UserInfo

Update an existing user's information in the AuthN Server.

Arguments:

username str - The ID of the user to update.
password str, optional - The new password for the user.
roles List[str], optional - The list of names of roles to assign to the user.

Returns:

UserInfo - The updated user's information.

Raises:

AISError - If the user update request fails.

Class: AccessAttr

class AccessAttr(IntFlag)

AccessAttr defines permissions as bitwise flags for access control (for more details, refer to the Go API).

describe

@staticmethod
def describe(perms: int) -> str

Returns a comma-separated string describing the permissions based on the provided bitwise flags.

Class: Bucket

class Bucket(AISSource)

A class representing a bucket that contains user data.

Arguments:

client RequestClient - Client for interfacing with AIS cluster
name str - name of bucket
provider str or Provider, optional - Provider of bucket (one of "ais", "aws", "gcp", ...), defaults to "ais"
namespace Namespace, optional - Namespace of bucket, defaults to None

client

@property
def client() -> RequestClient

The client used by this bucket.

client

@client.setter
def client(client)

Update the client used by this bucket.

qparam

@property
def qparam() -> Dict

Default query parameters to use with API calls from this bucket.

provider

@property
def provider() -> Provider

The provider for this bucket.

name

@property
def name() -> str

The name of this bucket.

namespace

@property
def namespace() -> Namespace

The namespace for this bucket.

list_urls

def list_urls(prefix: str = "",
              etl: Optional[ETLConfig] = None) -> Iterable[str]

Generates full URLs for all objects in the bucket that match the specified prefix.

Arguments:

prefix str, optional - A string prefix to filter objects. Only objects with names starting with this prefix will be included. Defaults to an empty string (no filtering).
etl Optional[ETLConfig], optional - An optional ETL configuration. If provided, the URLs will include ETL processing parameters. Defaults to None.

Returns:

Iterable[str] - An iterator yielding full URLs of all objects matching the prefix.

list_all_objects_iter

def list_all_objects_iter(prefix: str = "",
                          props: str = "name,size") -> Iterable[Object]

Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.

Arguments:

prefix str, optional - Limit objects selected by a given string prefix
props str, optional - Comma-separated list of object properties to return. Default value is "name,size".
Properties - "name", "size", "atime", "version", "checksum", "target_url", "copies".

Returns:

Iterator of all object URLs matching the prefix

create

def create(exist_ok=False)

Creates a bucket in AIStore cluster. Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation.

Arguments:

exist_ok bool, optional - Ignore error if the cluster already contains this bucket

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

delete

def delete(missing_ok=False)

Destroys bucket in AIStore cluster. In all cases removes both the bucket's content and the bucket's metadata from the cluster. Note: AIS will not call the remote backend provider to delete the corresponding Cloud bucket (iff the bucket in question is, in fact, a Cloud bucket).

Arguments:

missing_ok bool, optional - Ignore error if bucket does not exist

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

rename

def rename(to_bck_name: str) -> str

Renames bucket in AIStore cluster. Only works on AIS buckets. Returns job ID that can be used later to check the status of the asynchronous operation.

Arguments:

to_bck_name str - New bucket name for bucket to be renamed as

Returns:

Job ID (as str) that can be used to check the status of the operation

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

evict

def evict(keep_md: bool = False)

Evicts bucket in AIStore cluster. NOTE: only Cloud buckets can be evicted.

Arguments:

keep_md bool, optional - If true, evicts objects but keeps the bucket's metadata (i.e., the bucket's name and its properties)

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

head

def head() -> Header

Requests bucket properties.

Returns:

Response header with the bucket properties

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

summary

def summary(uuid: str = "",
            prefix: str = "",
            cached: bool = True,
            present: bool = True)

Returns bucket summary (starts xaction job and polls for results).

Arguments:

uuid str - Identifier for the bucket summary. Defaults to an empty string.
prefix str - Prefix for objects to be included in the bucket summary. Defaults to an empty string (all objects).
cached bool - If True, summary entails cached entities. Defaults to True.
present bool - If True, summary entails present entities. Defaults to True.

Raises:

requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore
aistore.sdk.errors.AISError - All other types of errors with AIStore

info

def info(flt_presence: int = FLTPresence.FLT_EXISTS,
         bsumm_remote: bool = True,
         prefix: str = "")

Returns bucket summary and information/properties.

Arguments:

flt_presence FLTPresence - Describes the presence of buckets and objects with respect to their existence or non-existence in the AIS cluster using the enum FLTPresence. Defaults to value FLT_EXISTS and values are: FLT_EXISTS - object or bucket exists inside and/or outside cluster FLT_EXISTS_NO_PROPS - same as FLT_EXISTS but no need to return summary FLT_PRESENT - bucket is present or object is present and properly located FLT_PRESENT_NO_PROPS - same as FLT_PRESENT but no need to return summary FLT_PRESENT_CLUSTER - objects present anywhere/how in the cluster as replica, ec-slices, misplaced FLT_EXISTS_OUTSIDE - not present; exists outside cluster
bsumm_remote bool - If True, returned bucket info will include remote objects as well
prefix str - Only include objects with the given prefix in the bucket

Raises:

requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore
ValueError - flt_presence is not one of the expected values
aistore.sdk.errors.AISError - All other types of errors with AIStore

copy

def copy(to_bck: Bucket,
         prefix_filter: str = "",
         prepend: str = "",
         ext: Optional[Dict[str, str]] = None,
         dry_run: bool = False,
         force: bool = False,
         latest: bool = False,
         sync: bool = False,
         num_workers: Optional[int] = 0) -> str

Returns job ID that can be used later to check the status of the asynchronous operation.

Arguments:

to_bck Bucket - Destination bucket
prefix_filter str, optional - Only copy objects with names starting with this prefix
prepend str, optional - Value to prepend to the name of copied objects
ext Dict[str, str], optional - Dict mapping each extension to the extension that will replace it (e.g. {"jpg": "txt"})
dry_run bool, optional - Determines if the copy should actually happen or not
force bool, optional - Override existing destination bucket
latest bool, optional - GET the latest object version from the associated remote bucket
sync bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
num_workers int, optional - Number of concurrent workers for the copy job per target
- 0 (default): number of mountpaths
- -1: single thread, serial execution

Returns:

Job ID (as str) that can be used to check the status of the operation

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

list_objects

def list_objects(prefix: str = "",
                 props: str = "",
                 page_size: int = 0,
                 uuid: str = "",
                 continuation_token: str = "",
                 flags: List[ListObjectFlag] = None,
                 target: str = "") -> BucketList

Returns a structure that contains a page of objects, job ID, and continuation token (to read the next page, if available).

Arguments:

prefix str, optional - Return only objects that start with the prefix
props str, optional - Comma-separated list of object properties to return. Default value is "name,size".
Properties - "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".
page_size int, optional - Return at most "page_size" objects. The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
NOTE - If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number of objects.
uuid str, optional - Job ID, required to get the next page of objects
continuation_token str, optional - Marks the object to start reading the next page
flags List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node

Returns:

BucketList - the page of objects in the bucket and the continuation token to get the next page Empty continuation token marks the final page of the object list

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

list_objects_iter

def list_objects_iter(prefix: str = "",
                      props: str = "",
                      page_size: int = 0,
                      flags: List[ListObjectFlag] = None,
                      target: str = "") -> ObjectIterator

Returns an iterator for all objects in bucket

Arguments:

prefix str, optional - Return only objects that start with the prefix
props str, optional - Comma-separated list of object properties to return. Default value is "name,size".
Properties - "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".
page_size int, optional - return at most "page_size" objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
NOTE - If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number objects
flags List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node

Returns:

ObjectIterator - object iterator

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

list_all_objects

def list_all_objects(prefix: str = "",
                     props: str = "",
                     page_size: int = 0,
                     flags: List[ListObjectFlag] = None,
                     target: str = "") -> List[BucketEntry]

Returns a list of all objects in bucket

Arguments:

prefix str, optional - return only objects that start with the prefix
props str, optional - comma-separated list of object properties to return. Default value is "name,size".
Properties - "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".
page_size int, optional - return at most "page_size" objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
NOTE - If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number objects
flags List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node

Returns:

List[BucketEntry] - list of objects in bucket

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

list_archive

def list_archive(archive_obj_name: str,
                 include_archive_obj: bool = False,
                 props: str = "",
                 page_size: int = 0) -> List[BucketEntry]

List files contained in an archived object (*.tar, *.zip, *.tgz, etc.).

This is a convenience wrapper around list_all_objects that automatically enables the ARCH_DIR list-flag so the cluster opens the shard and returns its directory.

Arguments:

archive_obj_name str - Object key of the shard inside this bucket (e.g. "my-archive.tar"). Can include a prefix path.
include_archive_obj bool, optional - If True the returned list includes the parent archive object itself. When False (default) only the entries inside the shard are returned.
props str, optional - Comma-separated list of object properties to request. Defaults to "" (no properties).
page_size int, optional - Same meaning as in list_all_objects – how many names per internal page.

Returns:

List[BucketEntry] - Entries representing the shard (optionally) and every file stored inside it.

transform

def transform(etl_name: str,
              to_bck: Bucket,
              timeout: str = DEFAULT_ETL_TIMEOUT,
              prefix_filter: str = "",
              prepend: str = "",
              ext: Optional[Dict[str, str]] = None,
              force: bool = False,
              dry_run: bool = False,
              latest: bool = False,
              sync: bool = False,
              num_workers: Optional[int] = 0,
              cont_on_err: bool = False) -> str

Visits all selected objects in the source bucket and for each object, puts the transformed result to the destination bucket

Arguments:

etl_name str - name of etl to be used for transformations
to_bck str - destination bucket for transformations
timeout str, optional - Timeout of the ETL job (e.g. 5m for 5 minutes)
prefix_filter str, optional - Only transform objects with names starting with this prefix
prepend str, optional - Value to prepend to the name of resulting transformed objects
ext Dict[str, str], optional - Dict mapping each extension to the extension that will replace it (e.g. {"jpg": "txt"})
dry_run bool, optional - determines if the copy should actually happen or not
force bool, optional - override existing destination bucket
latest bool, optional - GET the latest object version from the associated remote bucket
sync bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
num_workers int, optional - Number of concurrent workers for the transformation job per target
- 0 (default): number of mountpaths
- -1: single thread, serial execution
cont_on_err - (bool): If True, continue processing objects even if some of them fail

Returns:

Job ID (as str) that can be used to check the status of the operation

put_files

def put_files(path: str,
              prefix_filter: str = "",
              pattern: str = "*",
              basename: bool = False,
              prepend: str = None,
              recursive: bool = False,
              dry_run: bool = False,
              verbose: bool = True) -> List[str]

Puts files found in a given filepath as objects to a bucket in AIS storage.

Arguments:

path str - Local filepath, can be relative or absolute
prefix_filter str, optional - Only put files with names starting with this prefix
pattern str, optional - Shell-style wildcard pattern to filter files
basename bool, optional - Whether to use the file names only as object names and omit the path information
prepend str, optional - Optional string to use as a prefix in the object name for all objects uploaded No delimiter ("/", "-", etc.) is automatically applied between the prepend value and the object name
recursive bool, optional - Whether to recurse through the provided path directories
dry_run bool, optional - Option to only show expected behavior without an actual put operation
verbose bool, optional - Whether to print upload info to standard output

Returns:

List of object names put to a bucket in AIS

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
ValueError - The path provided is not a valid directory

object

def object(obj_name: str, props: ObjectProps = None) -> Object

Factory constructor for an object in this bucket. Does not make any HTTP request, only instantiates an object in a bucket owned by the client.

Arguments:

obj_name str - Name of object
props ObjectProps, optional - Properties of the object, as updated by head(), optionally pre-initialized.

Returns:

The object created.

objects

def objects(obj_names: List = None,
            obj_range: ObjectRange = None,
            obj_template: str = None) -> ObjectGroup

Factory constructor for multiple objects belonging to this bucket.

Arguments:

obj_names list - Names of objects to include in the group
obj_range ObjectRange - Range of objects to include in the group
obj_template str - String template defining objects to include in the group

Returns:

The ObjectGroup created

make_request

def make_request(method: str,
                 action: str,
                 value: Dict = None,
                 params: Dict = None) -> requests.Response

Use the bucket's client to make a request to the bucket endpoint on the AIS server

Arguments:

method str - HTTP method to use, e.g. POST/GET/DELETE
action str - Action string used to create an ActionMsg to pass to the server
value dict - Additional value parameter to pass in the ActionMsg
params dict, optional - Optional parameters to pass in the request

Returns:

Response from the server

verify_cloud_bucket

def verify_cloud_bucket()

Verify the bucket provider is a cloud provider

get_path

def get_path() -> str

Get the path representation of this bucket

as_model

def as_model() -> BucketModel

Return a data-model of the bucket

Returns:

BucketModel representation

write_dataset

def write_dataset(config: DatasetConfig, skip_missing: bool = True, **kwargs)

Write a dataset to a bucket in AIS in webdataset format using wds.ShardWriter. Logs the missing attributes

Arguments:

config DatasetConfig - Configuration dict specifying how to process and store each part of the dataset item
skip_missing bool, optional - Skip samples that are missing one or more attributes, defaults to True
**kwargs optional - Optional keyword arguments to pass to the ShardWriter

Class: Client

class Client()

AIStore client for managing buckets, objects, and ETL jobs.

Arguments:

endpoint str - AIStore endpoint.
skip_verify bool, optional - If True, skip SSL certificate verification. Defaults to False.
ca_cert str, optional - Path to a CA certificate file for SSL verification. If not provided, the 'AIS_CLIENT_CA' environment variable will be used. Defaults to None.
client_cert Union[str, Tuple[str, str], None], optional - Path to a client certificate PEM file or a tuple (cert, key) for mTLS. If not provided, 'AIS_CRT' and 'AIS_CRT_KEY' environment variables will be used. Defaults to None.
timeout Union[float, Tuple[float, float], None], optional - Timeout for HTTP requests.
- Single float (e.g., 5.0): Applies to both connection and read timeouts.
- Tuple (e.g., (3.0, 20.0)): First value is the connection timeout, second is the read timeout.
- None: Disables timeouts (not recommended). Defaults to (3, 20).
retry_config RetryConfig, optional - Defines retry behavior for HTTP and network failures. If not provided, the default retry configuration (RetryConfig.default()) is used.
retry urllib3.Retry, optional - [Deprecated] Retry configuration from urllib3. Use retry_config instead.
token str, optional - Authorization token. If not provided, the 'AIS_AUTHN_TOKEN' environment variable will be used. Defaults to None.
max_pool_size int, optional - Maximum number of connections per host in the connection pool. Defaults to 10.

bucket

def bucket(bck_name: str,
           provider: Union[Provider, str] = Provider.AIS,
           namespace: Namespace = None)

Factory constructor for bucket object. Does not make any HTTP request, only instantiates a bucket object.

Arguments:

bck_name str - Name of bucket
provider str or Provider - Provider of bucket, one of "ais", "aws", "gcp", ... (optional, defaults to ais)
namespace Namespace - Namespace of bucket (optional, defaults to None)

Returns:

The bucket object created.

cluster

def cluster()

Factory constructor for cluster object. Does not make any HTTP request, only instantiates a cluster object.

Returns:

The cluster object created.

job

def job(job_id: str = "", job_kind: str = "")

Factory constructor for job object, which contains job-related functions. Does not make any HTTP request, only instantiates a job object.

Arguments:

job_id str, optional - Optional ID for interacting with a specific job
job_kind str, optional - Optional specific type of job empty for all kinds

Returns:

The job object created.

etl

def etl(etl_name: str)

Factory constructor for ETL object. Contains APIs related to AIStore ETL operations. Does not make any HTTP request, only instantiates an ETL object.

Arguments:

etl_name str - Name of the ETL

Returns:

The ETL object created.

dsort

def dsort(dsort_id: str = "")

Factory constructor for dSort object. Contains APIs related to AIStore dSort operations. Does not make any HTTP request, only instantiates a dSort object.

Arguments:

dsort_id - ID of the dSort job

Returns:

dSort object created

batch_loader

def batch_loader()

Factory constructor for BatchLoader object. Contains APIs related to AIStore GetBatch operations. Does not make any HTTP requests, only creates BatchLoader.

Returns:

BatchLoader - The BatchLoader created

fetch_object_by_url

def fetch_object_by_url(url: str) -> Object

Deprecated: Use get_object_from_url instead.

Creates an Object instance from a URL.

This method does not make any HTTP requests.

Arguments:

url str - Full URL of the object (e.g., "ais://bucket1/file.txt")

Returns:

Object - The object constructed from the specified URL

get_object_from_url

def get_object_from_url(url: str) -> Object

Creates an Object instance from a URL.

This method does not make any HTTP requests.

Arguments:

url str - Full URL of the object (e.g., "ais://bucket1/file.txt")

Returns:

Object - The object constructed from the specified URL

Raises:

InvalidURLException - If the URL is invalid.

Class: Cluster

class Cluster()

A class representing a cluster bound to an AIS client.

client

@property
def client()

Client this cluster uses to make requests

get_info

def get_info() -> Smap

Returns state of AIS cluster, including the detailed information about its nodes.

Returns:

aistore.sdk.types.Smap - Smap containing cluster information

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore

get_primary_url

def get_primary_url() -> str

Returns: URL of primary proxy

list_buckets

def list_buckets(provider: Union[str, Provider] = Provider.AIS)

Returns list of buckets in AIStore cluster.

Arguments:

provider str or Provider, optional - Provider of bucket (one of "ais", "aws", "gcp", ...). Defaults to "ais". Empty provider returns buckets of all providers.

Returns:

List[BucketModel] - A list of buckets

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore

list_jobs_status

def list_jobs_status(job_kind="", target_id="") -> List[JobStatus]

List the status of jobs on the cluster

Arguments:

job_kind str, optional - Only show jobs of a particular type
target_id str, optional - Limit to jobs on a specific target node

Returns:

List of JobStatus objects

list_running_jobs

def list_running_jobs(job_kind="", target_id="") -> List[str]

List the currently running jobs on the cluster

Arguments:

job_kind str, optional - Only show jobs of a particular type
target_id str, optional - Limit to jobs on a specific target node

Returns:

List of jobs in the format job_kind[job_id]

list_etls

def list_etls(stages: Optional[List[str]] = None) -> List[ETLInfo]

Lists ETLs filtered by their stages.

Arguments:

stages List[str], optional - List of stages to filter ETLs by. Defaults to ["running"].

Returns:

List[ETLInfo] - A list of details on ETLs matching the specified stages

is_ready

def is_ready() -> bool

Checks if cluster is ready or still setting up.

Returns:

bool - True if cluster is ready, or false if cluster is still setting up

get_performance

def get_performance() -> Dict

Retrieves the raw performance and status data from each target node in the AIStore cluster.

Returns:

Dict - A dictionary where each key is the ID of a target node and each value is the raw AIS performance/status JSON returned by that node (for more information, see https://aistore.nvidia.com/docs/monitoring-metrics#target-metrics).

Raises:

requests.RequestException - If there's an ambiguous exception while processing the request
requests.ConnectionError - If there's a connection error with the cluster
requests.ConnectionTimeout - If the connection to the cluster times out
requests.ReadTimeout - If the timeout is reached while awaiting a response from the cluster

get_uuid

def get_uuid() -> str

Returns: UUID of AIStore Cluster

Class: Job

class Job()

A class containing job-related functions.

Arguments:

client RequestClient - Client for interfacing with AIS cluster
job_id str, optional - ID of a specific job, empty for all jobs
job_kind str, optional - Specific kind of job, empty for all kinds

job_id

@property
def job_id()

Return job id

job_kind

@property
def job_kind()

Return job kind

status

def status() -> JobStatus

Return status of a job

Returns:

The job status including id, finish time, and error info.

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore

wait

def wait(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT, verbose: bool = True)

Wait for a job to finish

Arguments:

timeout int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.
verbose bool, optional - Whether to log wait status to standard output

Returns:

None

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
errors.Timeout - Timeout while waiting for the job to finish

wait_for_idle

def wait_for_idle(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT,
                  verbose: bool = True)

Wait for a job to reach an idle state

Arguments:

timeout int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.
verbose bool, optional - Whether to log wait status to standard output

Returns:

None

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
errors.Timeout - Timeout while waiting for the job to finish
errors.JobInfoNotFound - Raised when information on a job's status could not be found on the AIS cluster

wait_single_node

def wait_single_node(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT,
                     verbose: bool = True)

Wait for a job running on a single node

Arguments:

timeout int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.
verbose bool, optional - Whether to log wait status to standard output

Returns:

None

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
errors.Timeout - Timeout while waiting for the job to finish
errors.JobInfoNotFound - Raised when information on a job's status could not be found on the AIS cluster

start

def start(daemon_id: str = "",
          force: bool = False,
          buckets: List[Bucket] = None) -> str

Start a job and return its ID.

Arguments:

daemon_id str, optional - For running a job that must run on a specific target node (e.g. resilvering).
force bool, optional - Override existing restrictions for a bucket (e.g., run LRU eviction even if the bucket has LRU disabled).
buckets List[Bucket], optional - List of one or more buckets; applicable only for jobs that have bucket scope (for details on job types, see Table in xact/api.go).

Returns:

The running job ID.

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore

get_within_timeframe

def get_within_timeframe(start_time: datetime,
                         end_time: Optional[datetime] = None) -> List[JobSnap]

Retrieves jobs that started after a specified start_time and optionally ended before a specified end_time.

Arguments:

start_time datetime - The start of the timeframe for monitoring jobs.
end_time datetime, optional - The end of the timeframe for monitoring jobs.

Returns:

List[JobSnapshot] - A list of jobs that meet the specified timeframe criteria.

Raises:

JobInfoNotFound - Raised when no relevant job info is found.

get_details

def get_details() -> AggregatedJobSnap

Retrieve detailed job snapshot information across all targets.

Returns:

AggregatedJobSnapshots - A snapshot containing detailed metrics for the job.

get_total_time

def get_total_time() -> Optional[timedelta]

Calculates the total job duration as the difference between the earliest start time and the latest end time among all job snapshots. If any snapshot is missing an end_time, returns None to indicate the job is incomplete.

Returns:

Optional[timedelta] - The total duration of the job, or None if incomplete.

Class: ColdGetConf

@dataclass
class ColdGetConf()

Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold GET.

Attributes: est_bandwidth_bps (int): Estimated bandwidth in bytes per second from the AIS cluster to backend buckets. Used to determine retry intervals for fetching remote objects. Raising this will decrease the initial time we expect object fetch to take. Defaults to 1 Gbps. max_cold_wait (int): Maximum total number of seconds to wait for an object to be present before re-raising a ReadTimeoutError to be handled by the top-level RetryConfig. Defaults to 3 minutes.

default

@staticmethod
def default() -> "ColdGetConf"

Returns the default cold get config options.

Class: RetryConfig

@dataclass
class RetryConfig()

Configuration class for managing both HTTP and network retries in AIStore.

AIStore implements two types of retries to ensure reliability and fault tolerance:

HTTP Retry (urllib3.Retry) - Handles HTTP errors based on status codes (e.g., 429, 500, 502, 503, 504).
Network Retry (tenacity) - Recovers from connection failures, timeouts, and unreachable targets.

Why two types of retries?

AIStore uses redirects for GET/PUT operations.
If a target node is down, we must retry the request via the proxy instead of the same failing target.
network_retry ensures that the request is reattempted at the proxy level, preventing unnecessary failures.

Attributes: http_retry (urllib3.Retry): Defines retry behavior for transient HTTP errors. network_retry (tenacity.Retrying): Configured tenacity.Retrying instance managing retries for network-related issues, such as connection failures, timeouts, or unreachable targets. cold_get_conf (ColdGetConf): Configuration for retrying COLD GET requests, see ColdGetConf class.

default

@staticmethod
def default() -> "RetryConfig"

Returns the default retry configuration for AIStore.

Class: ObjectGroup

class ObjectGroup(AISSource)

A class representing multiple objects within the same bucket. Only one of obj_names, obj_range, or obj_template should be provided.

Arguments:

bck Bucket - Bucket the objects belong to
obj_names list[str], optional - List of object names to include in this collection
obj_range ObjectRange, optional - Range defining which object names in the bucket should be included
obj_template str, optional - String argument to pass as template value directly to api

client

@property
def client() -> RequestClient

The client bound to the bucket used by the ObjectGroup.

client

@client.setter
def client(client) -> RequestClient

Update the client bound to the bucket used by the ObjectGroup.

list_urls

def list_urls(prefix: str = "",
              etl: Optional[ETLConfig] = None) -> Iterable[str]

Implementation of the abstract method from AISSource that provides an iterator of full URLs to every object in this bucket matching the specified prefix

Arguments:

prefix str, optional - Limit objects selected by a given string prefix
etl Optional[ETLConfig], optional - An optional ETL configuration. If provided, the URLs will include ETL processing parameters. Defaults to None.

Returns:

Iterator of all object URLs in the group

list_all_objects_iter

def list_all_objects_iter(prefix: str = "",
                          props: str = "name,size") -> Iterable[Object]

Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.

Arguments:

prefix str, optional - Limit objects selected by a given string prefix
props str, optional - By default, will include all object properties. Pass in None to skip and avoid the extra API call.

Returns:

Iterator of all the objects in the group

delete

def delete()

Deletes a list or range of objects in a bucket

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

evict

def evict()

Evicts a list or range of objects in a bucket so that they are no longer cached in AIS NOTE: only Cloud buckets can be evicted.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

prefetch

def prefetch(blob_threshold: int = None,
             num_workers: int = None,
             latest: bool = False,
             continue_on_error: bool = False)

Prefetches a list or range of objects in a bucket so that they are cached in AIS NOTE: only Cloud buckets can be prefetched.

Arguments:

latest bool, optional - GET the latest object version from the associated remote bucket
continue_on_error bool, optional - Whether to continue if there is an error prefetching a single object
blob_threshold int, optional - Utilize built-in blob-downloader for remote objects greater than the specified (threshold) size in bytes
num_workers int, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

copy

def copy(to_bck: "Bucket",
         prepend: str = "",
         continue_on_error: bool = False,
         dry_run: bool = False,
         force: bool = False,
         latest: bool = False,
         sync: bool = False,
         num_workers: int = None) -> List[str]

Copies a list or range of objects in a bucket

Arguments:

to_bck Bucket - Destination bucket
prepend str, optional - Value to prepend to the name of copied objects
continue_on_error bool, optional - Whether to continue if there is an error copying a single object
dry_run bool, optional - Skip performing the copy and just log the intended actions
force bool, optional - Force this job to run over others in case it conflicts (see "limited coexistence" and xact/xreg/xreg.go)
latest bool, optional - GET the latest object version from the associated remote bucket
sync bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
num_workers int, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

List[str] - List of job IDs that can be used to check the status of the operation

transform

def transform(to_bck: "Bucket",
              etl_name: str,
              timeout: str = DEFAULT_ETL_TIMEOUT,
              prepend: str = "",
              ext: Dict[str, str] = None,
              continue_on_error: bool = False,
              dry_run: bool = False,
              force: bool = False,
              latest: bool = False,
              sync: bool = False,
              num_workers: int = None)

Performs ETL operation on a list or range of objects in a bucket, placing the results in the destination bucket

Arguments:

to_bck Bucket - Destination bucket
etl_name str - Name of existing ETL to apply
timeout str - Timeout of the ETL job (e.g. 5m for 5 minutes)
prepend str, optional - Value to prepend to the name of resulting transformed objects
ext Dict[str, str], optional - Dict mapping each extension to the extension that will replace it (i.e. {"jpg": "txt"})
continue_on_error bool, optional - Whether to continue if there is an error transforming a single object
dry_run bool, optional - Skip performing the transform and just log the intended actions
force bool, optional - Force this job to run over others in case it conflicts (see "limited coexistence" and xact/xreg/xreg.go)
latest bool, optional - GET the latest object version from the associated remote bucket
sync bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
num_workers int, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

list_names

def list_names() -> List[str]

List all the object names included in this group of objects

Returns:

List of object names

Class: ObjectNames

class ObjectNames(ObjectCollection)

A collection of object names, provided as a list of strings

Arguments:

names List[str] - A list of object names

Class: ObjectRange

class ObjectRange(ObjectCollection)

Class representing a range of object names

Arguments:

prefix str - Prefix contained in all names of objects
min_index int - Starting index in the name of objects
max_index int - Last index in the name of all objects
pad_width int, optional - Left-pad indices with zeros up to the width provided, e.g. pad_width = 3 will transform 1 to 001
step int, optional - Size of iterator steps between each item
suffix str, optional - Suffix at the end of all object names

from_string

@classmethod
def from_string(cls, range_string: str)

Construct an ObjectRange instance from a valid range string like 'input-{00..99..1}.txt'

Arguments:

range_string str - The range string to parse

Returns:

ObjectRange - An instance of the ObjectRange class

Class: ObjectTemplate

class ObjectTemplate(ObjectCollection)

A collection of object names specified by a template in the bash brace expansion format

Arguments:

template str - A string template that defines the names of objects to include in the collection

Class: BucketDetails

@dataclass
class BucketDetails()

Metadata about a bucket, used by objects within that bucket.

Class: Object

class Object()

Provides methods for interacting with an object in AIS.

Arguments:

client RequestClient - Client used for all http requests.
bck_details BucketDetails - Metadata about the bucket to which this object belongs.
name str - Name of the object.
props ObjectProps, optional - Properties of the object, as updated by head(), optionally pre-initialized.

bucket_name

@property
def bucket_name() -> str

Name of the bucket where this object resides.

bucket_provider

@property
def bucket_provider() -> Provider

Provider of the bucket where this object resides (e.g. ais, s3, gcp).

query_params

@property
def query_params() -> Dict[str, str]

Query params used as a base for constructing all requests for this object.

name

@property
def name() -> str

Name of this object.

uname

@property
def uname() -> str

Unified name (uname) of this object, which combines the bucket path and object name.

Returns:

str - The unified name in the format bucket_path/object_name

props

@property
def props() -> ObjectProps

Get the latest properties of the object.

This will make a HEAD request to the AIStore cluster to fetch up-to-date object headers and refresh the internal _props cache. Use this when you want to ensure you're accessing the most recent metadata for the object.

Returns:

ObjectProps - The latest object properties from the server.

props_cached

@property
def props_cached() -> Optional[ObjectProps]

Get the cached object properties (without making a network call).

This is useful when:

You want to avoid a network request.
You're sure the cached _props was already set via a previous call to head() or during object construction.

Returns:

ObjectProps or None: Cached object properties, or None if not set.

head

def head() -> CaseInsensitiveDict

Requests object properties and returns headers. Updates props.

Returns:

Response header with the object properties.

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
requests.exceptions.HTTPError(404) - The object does not exist

get_reader

def get_reader(archive_config: Optional[ArchiveConfig] = None,
               blob_download_config: Optional[BlobDownloadConfig] = None,
               chunk_size: int = DEFAULT_CHUNK_SIZE,
               etl: Optional[ETLConfig] = None,
               writer: Optional[BufferedWriter] = None,
               latest: bool = False,
               byte_range: Optional[str] = None,
               direct: bool = False) -> ObjectReader

Creates and returns an ObjectReader with access to object contents and optionally writes to a provided writer.

Arguments:

archive_config Optional[ArchiveConfig] - Settings for archive extraction.
blob_download_config Optional[BlobDownloadConfig] - Settings for using blob download.
chunk_size int, optional - Chunk size to use while reading from stream.
etl Optional[ETLConfig] - Settings for ETL-specific operations (name, args).
writer Optional[BufferedWriter] - User-provided writer for writing content output. The user is responsible for closing the writer.
latest bool, optional - GET the latest object version from the associated remote bucket.
byte_range Optional[str] - Byte range in RFC 7233 format for single-range requests (e.g., "bytes=0-499", "bytes=500-", "bytes=-500").
See - https://www.rfc-editor.org/rfc/rfc7233#section-2.1.
direct bool, optional - If True, the object content is read directly from the target node, bypassing the proxy.

Returns:

ObjectReader - An iterator for streaming object content.

Raises:

ValueError - If Byte Range is used with Blob Download.
requests.RequestException - If an error occurs during the request.
requests.ConnectionError - If there is a connection error.
requests.ConnectionTimeout - If the connection times out.
requests.ReadTimeout - If the read operation times out.

get

def get(archive_config: ArchiveConfig = None,
        blob_download_config: BlobDownloadConfig = None,
        chunk_size: int = DEFAULT_CHUNK_SIZE,
        etl: ETLConfig = None,
        writer: BufferedWriter = None,
        latest: bool = False,
        byte_range: str = None) -> ObjectReader

Deprecated: Use 'get_reader' instead.

Creates and returns an ObjectReader with access to object contents and optionally writes to a provided writer.

Arguments:

archive_config ArchiveConfig, optional - Settings for archive extraction.
blob_download_config BlobDownloadConfig, optional - Settings for using blob download.
chunk_size int, optional - Chunk size to use while reading from stream.
etl ETLConfig, optional - Settings for ETL-specific operations (name, meta).
writer BufferedWriter, optional - User-provided writer for writing content output. The user is responsible for closing the writer.
latest bool, optional - GET the latest object version from the associated remote bucket.
byte_range str, optional - Byte range in RFC 7233 format for single-range requests (e.g., "bytes=0-499", "bytes=500-", "bytes=-500").
See - https://www.rfc-editor.org/rfc/rfc7233#section-2.1.

Returns:

ObjectReader - An ObjectReader that can be iterated over to stream chunks of object content or used to read all content directly.

Raises:

ValueError - If Byte Range is used with Blob Download.
requests.RequestException - If an error occurs during the request.
requests.ConnectionError - If there is a connection error.
requests.ConnectionTimeout - If the connection times out.
requests.ReadTimeout - If the read operation times out.

get_semantic_url

def get_semantic_url() -> str

Get the semantic URL to the object

Returns:

Semantic URL to get object

get_url

def get_url(archpath: str = "", etl: ETLConfig = None) -> str

Get the full url to the object including base url and any query parameters

Arguments:

archpath str, optional - If the object is an archive, use archpath to extract a single file from the archive
etl ETLConfig, optional - Settings for ETL-specific operations (name, meta).

Returns:

Full URL to get object

put_content

def put_content(content: bytes) -> Response

Deprecated: Use 'ObjectWriter.put_content' instead.

Puts bytes as an object to a bucket in AIS storage.

Arguments:

content bytes - Bytes to put as an object.

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore

put_file

def put_file(path: str or Path) -> Response

Deprecated: Use 'ObjectWriter.put_file' instead.

Puts a local file as an object to a bucket in AIS storage.

Arguments:

path str or Path - Path to local file

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
ValueError - The path provided is not a valid file

get_writer

def get_writer() -> ObjectWriter

Create an ObjectWriter to write to object contents and attributes.

Returns:

An ObjectWriter which can be used to write to an object's contents and attributes.

promote

def promote(path: str,
            target_id: str = "",
            recursive: bool = False,
            overwrite_dest: bool = False,
            delete_source: bool = False,
            src_not_file_share: bool = False) -> str

Promotes a file or folder an AIS target can access to a bucket in AIS storage. These files can be either on the physical disk of an AIS target itself or on a network file system the cluster can access. See more info here: https://aiatscale.org/blog/2022/03/17/promote

Arguments:

path str - Path to file or folder the AIS cluster can reach
target_id str, optional - Promote files from a specific target node
recursive bool, optional - Recursively promote objects from files in directories inside the path
overwrite_dest bool, optional - Overwrite objects already on AIS
delete_source bool, optional - Delete the source files when done promoting
src_not_file_share bool, optional - Optimize if the source is guaranteed to not be on a file share

Returns:

Job ID (as str) that can be used to check the status of the operation, or empty if job is done synchronously

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
AISError - Path does not exist on the AIS cluster storage

delete

def delete() -> Response

Delete an object from a bucket.

Returns:

None

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
requests.exceptions.HTTPError(404) - The object does not exist

copy

def copy(to_obj: "Object", etl: Optional[ETLConfig] = None) -> Response

Copy this object to another object (which specifies the destination bucket and name), optionally with ETL transformation.

Arguments:

to_obj Object - Destination object specifying both the target bucket and object name
etl ETLConfig, optional - ETL configuration for transforming the object during copy

Returns:

Response - The response from the copy operation

Raises:

requests.RequestException - "There's an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
requests.exceptions.HTTPError - Service unavailable

blob_download

def blob_download(chunk_size: int = None,
                  num_workers: int = None,
                  latest: bool = False) -> str

A special facility to download very large remote objects a.k.a. BLOBs Returns job ID that for the blob download operation.

Arguments:

chunk_size int - chunk size in bytes
num_workers int - number of concurrent blob-downloading workers (readers)
latest bool - GET the latest object version from the associated remote bucket

Returns:

Job ID (as str) that can be used to check the status of the operation

Raises:

aistore.sdk.errors.AISError - All other types of errors with AIStore
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.exceptions.HTTPError - Service unavailable
requests.RequestException - "There was an ambiguous exception that occurred while handling..."

append_content

def append_content(content: bytes,
                   handle: str = "",
                   flush: bool = False) -> str

Deprecated: Use 'ObjectWriter.append_content' instead.

Append bytes as an object to a bucket in AIS storage.

Arguments:

content bytes - Bytes to append to the object.
handle str - Handle string to use for subsequent appends or flush (empty for the first append).
flush bool - Whether to flush and finalize the append operation, making the object accessible.

Returns:

handle str - Handle string to pass for subsequent appends or flush.

Raises:

requests.RequestException - "There was an ambiguous exception that occurred while handling..."
requests.ConnectionError - Connection error
requests.ConnectionTimeout - Timed out connecting to AIStore
requests.ReadTimeout - Timed out waiting response from AIStore
requests.exceptions.HTTPError(404) - The object does not exist

set_custom_props

def set_custom_props(custom_metadata: Dict[str, str],
                     replace_existing: bool = False) -> Response

Deprecated: Use 'ObjectWriter.set_custom_props' instead.

Set custom properties for the object.

Arguments:

custom_metadata Dict[str, str] - Custom metadata key-value pairs.
replace_existing bool, optional - Whether to replace existing metadata. Defaults to False.

Class: ObjectReader

class ObjectReader()

Provide a way to read an object's contents and attributes, optionally iterating over a stream of content.

Arguments:

object_client ObjectClient - Client for making requests to a specific object in AIS
chunk_size int, optional - Size of each data chunk to be fetched from the stream. Defaults to DEFAULT_CHUNK_SIZE.

head

def head() -> ObjectAttributes

Make a head request to AIS to update and return only object attributes.

Returns:

ObjectAttributes containing metadata for this object.

attributes

@property
def attributes() -> ObjectAttributes

Object metadata attributes.

Returns:

ObjectAttributes - Parsed object attributes from the headers returned by AIS.

read_all

def read_all() -> bytes

Read all byte data directly from the object response without using a stream.

This requires all object content to fit in memory at once and downloads all content before returning.

Returns:

bytes - Object content as bytes.

raw

def raw() -> requests.Response

Return the raw byte stream of object content.

Returns:

requests.Response - Raw byte stream of the object content.

as_file

def as_file(buffer_size: Optional[int] = None,
            max_resume: Optional[int] = 5) -> BufferedIOBase

Create a read-only, non-seekable ObjectFileReader instance for streaming object data in chunks. This file-like object primarily implements the read() method to retrieve data sequentially, with automatic retry/resumption in case of unexpected stream interruptions (e.g. ChunkedEncodingError, ConnectionError) or timeouts (e.g. ReadTimeout).

Arguments:

buffer_size int, optional - Currently unused; retained for backward compatibility and future enhancements.
max_resume int, optional - Total number of retry attempts allowed to resume the stream in case of interruptions. Defaults to 5.

Returns:

BufferedIOBase - A read-only, non-seekable file-like object for streaming object content.

Raises:

ValueError - If max_resume is invalid (must be a non-negative integer).

iter

def __iter__() -> Generator[bytes, None, None]

Make a request to get a stream from the provided object and yield chunks of the stream content.

Returns:

Generator[bytes, None, None]: An iterator over each chunk of bytes in the object.

Class: ObjectFileReader

class ObjectFileReader(BufferedIOBase)

A sequential read-only file-like object extending BufferedIOBase for reading object data, with support for both reading a fixed size of data and reading until the end of file (EOF).

When a read is requested, any remaining data from a previously fetched chunk is returned first. If the remaining data is insufficient to satisfy the request, the read() method fetches additional chunks from the provided iterator as needed, until the requested size is fulfilled or the end of the stream is reached.

In case of unexpected stream interruptions (e.g. ChunkedEncodingError, ConnectionError) or timeouts (e.g. ReadTimeout), the read() method automatically retries and resumes fetching data from the last successfully retrieved chunk. The max_resume parameter controls how many retry attempts are made before an error is raised.

Arguments:

content_provider ContentIterProvider - A provider that creates iterators which can fetch object data from AIS in chunks.
max_resume int - Maximum number of resumes allowed for an ObjectFileReader instance.

readable

@override
def readable() -> bool

Return whether the file is readable.

read

@override
def read(size: Optional[int] = -1) -> bytes

Read up to 'size' bytes from the object. If size is -1, read until the end of the stream.

Arguments:

size int, optional - The number of bytes to read. If -1, reads until EOF.

Returns:

bytes - The read data as a bytes object.

Raises:

ObjectFileReaderStreamError - If a connection cannot be made.
ObjectFileReaderMaxResumeError - If the stream is interrupted more than the allowed maximum.
ValueError - I/O operation on a closed file.
Exception - Any other errors while streaming and reading.

close

@override
def close() -> None

Close the file.

Class: ObjectFileWriter

class ObjectFileWriter(BufferedWriter)

A file-like writer object for AIStore, extending BufferedWriter.

Arguments:

obj_writer ObjectWriter - The ObjectWriter instance for handling write operations.
mode str - Specifies the mode in which the file is opened.
- 'w': Write mode. Opens the object for writing, truncating any existing content. Writing starts from the beginning of the object.
- 'a': Append mode. Opens the object for appending. Existing content is preserved, and writing starts from the end of the object.

write

@override
def write(buffer: bytes) -> int

Write data to the object.

Arguments:

data bytes - The data to write.

Returns:

int - Number of bytes written.

Raises:

ValueError - I/O operation on a closed file.

flush

@override
def flush() -> None

Flush the writer, ensuring the object is finalized.

This does not close the writer but makes the current state accessible.

Raises:

ValueError - I/O operation on a closed file.

close

@override
def close() -> None

Close the writer and finalize the object.

Class: ObjectProps

class ObjectProps(ObjectAttributes)

Represents the attributes parsed from the response headers returned from an API call to get an object. Extends ObjectAtributes and is a superset of that class.

Arguments:

response_headers CaseInsensitiveDict, optional - Response header dict containing object attributes

bucket_name

@property
def bucket_name()

Name of object's bucket

bucket_provider

@property
def bucket_provider()

Provider of object's bucket.

name

@property
def name() -> str

Name of the object.

location

@property
def location() -> str

Location of the object.

mirror_paths

@property
def mirror_paths() -> List[str]

List of mirror paths.

mirror_copies

@property
def mirror_copies() -> int

Number of mirror copies.

present

@property
def present() -> bool

True if object is present in cluster.

Class: ObjectAttributes

class ObjectAttributes()

Represents the attributes parsed from the response headers returned from an API call to get an object.

Arguments:

response_headers CaseInsensitiveDict - Response header dict containing object attributes

size

@property
def size() -> int

Size of object content.

checksum_type

@property
def checksum_type() -> str

Type of checksum, e.g. xxhash or md5.

checksum_value

@property
def checksum_value() -> str

Checksum value.

access_time

@property
def access_time() -> str

Time this object was accessed.

obj_version

@property
def obj_version() -> str

Object version.

custom_metadata

@property
def custom_metadata() -> Dict[str, str]

Dictionary of custom metadata.

present

@property
def present() -> bool

Whether the object is present/cached.

FilesExpand file tree

python_sdk.md

Latest commit

History

python_sdk.md

File metadata and controls

Class: AuthNClient

client

login

logout

cluster_manager

role_manager

user_manager

token_manager

Class: ClusterManager

client

list

get

register

update

delete

Class: RoleManager

client

list

get

create

update

delete

Class: TokenManager

client

revoke

Class: UserManager

client

get

delete

create

list

update

Class: AccessAttr

describe

Class: Bucket

client

client

qparam

provider

name

namespace

list_urls

list_all_objects_iter

create

delete

rename

evict

head

summary

info

copy

list_objects

list_objects_iter

list_all_objects

list_archive

transform

put_files

object

objects

make_request

verify_cloud_bucket

get_path

as_model

write_dataset

Class: Client

bucket

cluster

job

etl

dsort

batch_loader

fetch_object_by_url

get_object_from_url

Class: Cluster