| layout | title | permalink | redirect_from | ||
|---|---|---|---|---|---|
post |
PYTHON SDK |
/docs/python-sdk |
|
AIStore Python SDK is a growing set of client-side objects and methods to access and utilize AIS clusters. This document contains API documentation for the AIStore Python SDK.
For our PyTorch integration, please refer to the PyTorch Docs. For more information, please refer to AIS Python SDK available via Python Package Index (PyPI) or see https://github.com/NVIDIA/aistore/tree/main/python/aistore.
- authn.authn_client
- authn.cluster_manager
- authn.role_manager
- authn.token_manager
- authn.user_manager
- authn.access_attr
- bucket
- client
- cluster
- etl
- job
- retry_config
- multiobj.object_group
- multiobj.object_names
- multiobj.object_range
- multiobj.object_template
- obj.object
- obj.object_reader
- obj.obj_file.object_file
- obj.object_props
- obj.object_attributes
class AuthNClient()AuthN client for managing authentication.
This client provides methods to interact with AuthN Server. For more info on AuthN Server, see https://github.com/NVIDIA/aistore/blob/main/docs/authn.md
Arguments:
endpointstr - AuthN service endpoint URL.skip_verifybool, optional - If True, skip SSL certificate verification. Defaults to False.ca_certstr, optional - Path to a CA certificate file for SSL verification.timeoutUnion[float, Tuple[float, float], None], optional - Request timeout in seconds; a single float for both connect/read timeouts (e.g., 5.0), a tuple for separate connect/read timeouts (e.g., (3.0, 10.0)), or None to disable timeout.retryurllib3.Retry, optional - Retry configuration object from the urllib3 library.tokenstr, optional - Authorization token.
@property
def client() -> RequestClientGet the request client.
Returns:
RequestClient- The client this AuthN client uses to make requests.
def login(username: str,
password: str,
expires_in: Optional[Union[int, float]] = None) -> strLogs in to the AuthN Server and returns an authorization token.
Arguments:
usernamestr - The username to log in with.passwordstr - The password to log in with.expires_inOptional[Union[int, float]] - The expiration duration of the token in seconds.
Returns:
str- An authorization token to use for future requests.
Raises:
ValueError- If the password is empty or consists only of spaces.AISError- If the login request fails.
def logout() -> NoneLogs out and revokes current token from the AuthN Server.
Raises:
AISError- If the logout request fails.
def cluster_manager() -> ClusterManagerFactory method to create a ClusterManager instance.
Returns:
ClusterManager- An instance to manage cluster operations.
def role_manager() -> RoleManagerFactory method to create a RoleManager instance.
Returns:
RoleManager- An instance to manage role operations.
def user_manager() -> UserManagerFactory method to create a UserManager instance.
Returns:
UserManager- An instance to manage user operations.
def token_manager() -> TokenManagerFactory method to create a TokenManager instance.
Returns:
TokenManager- An instance to manage token operations.
class ClusterManager()ClusterManager class for handling operations on clusters within the context of authentication.
This class provides methods to list, get, register, update, and delete clusters on AuthN server.
Arguments:
clientRequestClient - The request client to make HTTP requests.
@property
def client() -> RequestClientRequestClient: The client this cluster manager uses to make requests.
def list() -> ClusterListRetrieve all clusters.
Returns:
ClusterList- A list of all clusters.
Raises:
AISError- If an error occurs while listing clusters.
def get(cluster_id: Optional[str] = None,
cluster_alias: Optional[str] = None) -> ClusterInfoRetrieve a specific cluster by ID or alias.
Arguments:
cluster_idOptional[str] - The ID of the cluster. Defaults to None.cluster_aliasOptional[str] - The alias of the cluster. Defaults to None.
Returns:
ClusterInfo- Information about the specified cluster.
Raises:
ValueError- If neither cluster_id nor cluster_alias is provided.RuntimeError- If no cluster matches the provided ID or alias.AISError- If an error occurs while getting the cluster.
def register(cluster_alias: str, urls: List[str]) -> ClusterInfoRegister a new cluster.
Arguments:
cluster_aliasstr - The alias for the new cluster.urlsList[str] - A list of URLs for the new cluster.
Returns:
ClusterInfo- Information about the registered cluster.
Raises:
ValueError- If no URLs are provided or an invalid URL is provided.AISError- If an error occurs while registering the cluster.
def update(cluster_id: str,
cluster_alias: Optional[str] = None,
urls: Optional[List[str]] = None) -> ClusterInfoUpdate an existing cluster.
Arguments:
cluster_idstr - The ID of the cluster to update.cluster_aliasOptional[str] - The new alias for the cluster. Defaults to None.urlsOptional[List[str]] - The new list of URLs for the cluster. Defaults to None.
Returns:
ClusterInfo- Information about the updated cluster.
Raises:
ValueError- If neither cluster_alias nor urls are provided.AISError- If an error occurs while updating the cluster
def delete(cluster_id: Optional[str] = None,
cluster_alias: Optional[str] = None)Delete a specific cluster by ID or alias.
Arguments:
cluster_idOptional[str] - The ID of the cluster to delete. Defaults to None.cluster_aliasOptional[str] - The alias of the cluster to delete. Defaults to None.
Raises:
ValueError- If neither cluster_id nor cluster_alias is provided.AISError- If an error occurs while deleting the cluster
class RoleManager()Manages role-related operations.
This class provides methods to interact with roles, including retrieving, creating, updating, and deleting role information.
Arguments:
clientRequestClient - The RequestClient used to make HTTP requests.
@property
def client() -> RequestClientReturns the RequestClient instance used by this RoleManager.
def list() -> RolesListRetrieves information about all roles.
Returns:
RoleList- A list containing information about all roles.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStore.requests.RequestException- If the HTTP request fails.
def get(role_name: str) -> RoleInfoRetrieves information about a specific role.
Arguments:
role_namestr - The name of the role to retrieve.
Returns:
RoleInfo- Information about the specified role.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStore.requests.RequestException- If the HTTP request fails.
def create(name: str,
desc: str,
cluster_alias: str,
perms: List[AccessAttr],
bucket_name: str = None) -> RoleInfoCreates a new role.
Arguments:
namestr - The name of the role.descstr - A description of the role.cluster_aliasstr - The alias of the cluster this role will have access to.permsList[AccessAttr] - A list of permissions to be granted for this role.bucket_namestr, optional - The name of the bucket this role will have access to.
Returns:
RoleInfo- Information about the newly created role.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStore.requests.RequestException- If the HTTP request fails.
def update(name: str,
desc: str = None,
cluster_alias: str = None,
perms: List[AccessAttr] = None,
bucket_name: str = None) -> RoleInfoUpdates an existing role.
Arguments:
namestr - The name of the role.descstr, optional - An updated description of the role.cluster_aliasstr, optional - The alias of the cluster this role will have access to.permsList[AccessAttr], optional - A list of updated permissions to be granted for this role.bucket_namestr, optional - The name of the bucket this role will have access to.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStore.requests.RequestException- If the HTTP request fails.ValueError- If the role does not exist or if invalid parameters are provided.
def delete(name: str, missing_ok: bool = False) -> NoneDeletes a role.
Arguments:
namestr - The name of the role to delete.missing_okbool - Ignore error if role does not exist. Defaults to False
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStore.requests.RequestException- If the HTTP request fails.ValueError- If the role does not exist.
class TokenManager()Manages token-related operations.
This class provides methods to interact with tokens in the AuthN server. .
Arguments:
clientRequestClient - The RequestClient used to make HTTP requests.
@property
def client() -> RequestClientReturns the RequestClient instance used by this TokenManager.
def revoke(token: str) -> NoneRevokes the specified authentication token.
Arguments:
tokenstr - The token to be revoked.
Raises:
ValueError- If the token is not provided.AISError- If the revoke token request fails.
class UserManager()UserManager provides methods to manage users in the AuthN service.
Arguments:
clientRequestClient - The RequestClient used to make HTTP requests.
@property
def client() -> RequestClientReturns the RequestClient instance used by this UserManager.
def get(username: str) -> UserInfoRetrieve user information from the AuthN Server.
Arguments:
usernamestr - The username to retrieve.
Returns:
UserInfo- The user's information.
Raises:
AISError- If the user retrieval request fails.
def delete(username: str, missing_ok: bool = False) -> NoneDelete an existing user from the AuthN Server.
Arguments:
usernamestr - The username of the user to delete.missing_okbool - Ignore error if user does not exist. Defaults to False.
Raises:
AISError- If the user deletion request fails.
def create(username: str, roles: List[str], password: str) -> UserInfoCreate a new user in the AuthN Server.
Arguments:
usernamestr - The name or ID of the user to create.passwordstr - The password for the user.rolesList[str] - The list of names of roles to assign to the user.
Returns:
UserInfo- The created user's information.
Raises:
AISError- If the user creation request fails.
def list()List all users in the AuthN Server.
Returns:
str- The list of users in the AuthN Server.
Raises:
AISError- If the user list request fails.
def update(username: str,
password: Optional[str] = None,
roles: Optional[List[str]] = None) -> UserInfoUpdate an existing user's information in the AuthN Server.
Arguments:
usernamestr - The ID of the user to update.passwordstr, optional - The new password for the user.rolesList[str], optional - The list of names of roles to assign to the user.
Returns:
UserInfo- The updated user's information.
Raises:
AISError- If the user update request fails.
class AccessAttr(IntFlag)AccessAttr defines permissions as bitwise flags for access control (for more details, refer to the Go API).
@staticmethod
def describe(perms: int) -> strReturns a comma-separated string describing the permissions based on the provided bitwise flags.
class Bucket(AISSource)A class representing a bucket that contains user data.
Arguments:
clientRequestClient - Client for interfacing with AIS clusternamestr - name of bucketproviderstr or Provider, optional - Provider of bucket (one of "ais", "aws", "gcp", ...), defaults to "ais"namespaceNamespace, optional - Namespace of bucket, defaults to None
@property
def client() -> RequestClientThe client used by this bucket.
@client.setter
def client(client)Update the client used by this bucket.
@property
def qparam() -> DictDefault query parameters to use with API calls from this bucket.
@property
def provider() -> ProviderThe provider for this bucket.
@property
def name() -> strThe name of this bucket.
@property
def namespace() -> NamespaceThe namespace for this bucket.
def list_urls(prefix: str = "",
etl: Optional[ETLConfig] = None) -> Iterable[str]Generates full URLs for all objects in the bucket that match the specified prefix.
Arguments:
prefixstr, optional - A string prefix to filter objects. Only objects with names starting with this prefix will be included. Defaults to an empty string (no filtering).etlOptional[ETLConfig], optional - An optional ETL configuration. If provided, the URLs will include ETL processing parameters. Defaults to None.
Returns:
Iterable[str]- An iterator yielding full URLs of all objects matching the prefix.
def list_all_objects_iter(prefix: str = "",
props: str = "name,size") -> Iterable[Object]Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.
Arguments:
prefixstr, optional - Limit objects selected by a given string prefixpropsstr, optional - Comma-separated list of object properties to return. Default value is "name,size".Properties- "name", "size", "atime", "version", "checksum", "target_url", "copies".
Returns:
Iterator of all object URLs matching the prefix
def create(exist_ok=False)Creates a bucket in AIStore cluster. Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation.
Arguments:
exist_okbool, optional - Ignore error if the cluster already contains this bucket
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider- Invalid bucket provider for requested operationrequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def delete(missing_ok=False)Destroys bucket in AIStore cluster. In all cases removes both the bucket's content and the bucket's metadata from the cluster. Note: AIS will not call the remote backend provider to delete the corresponding Cloud bucket (iff the bucket in question is, in fact, a Cloud bucket).
Arguments:
missing_okbool, optional - Ignore error if bucket does not exist
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider- Invalid bucket provider for requested operationrequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def rename(to_bck_name: str) -> strRenames bucket in AIStore cluster. Only works on AIS buckets. Returns job ID that can be used later to check the status of the asynchronous operation.
Arguments:
to_bck_namestr - New bucket name for bucket to be renamed as
Returns:
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider- Invalid bucket provider for requested operationrequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def evict(keep_md: bool = False)Evicts bucket in AIStore cluster. NOTE: only Cloud buckets can be evicted.
Arguments:
keep_mdbool, optional - If true, evicts objects but keeps the bucket's metadata (i.e., the bucket's name and its properties)
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider- Invalid bucket provider for requested operationrequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def head() -> HeaderRequests bucket properties.
Returns:
Response header with the bucket properties
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def summary(uuid: str = "",
prefix: str = "",
cached: bool = True,
present: bool = True)Returns bucket summary (starts xaction job and polls for results).
Arguments:
uuidstr - Identifier for the bucket summary. Defaults to an empty string.prefixstr - Prefix for objects to be included in the bucket summary. Defaults to an empty string (all objects).cachedbool - If True, summary entails cached entities. Defaults to True.presentbool - If True, summary entails present entities. Defaults to True.
Raises:
requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStoreaistore.sdk.errors.AISError- All other types of errors with AIStore
def info(flt_presence: int = FLTPresence.FLT_EXISTS,
bsumm_remote: bool = True,
prefix: str = "")Returns bucket summary and information/properties.
Arguments:
flt_presenceFLTPresence - Describes the presence of buckets and objects with respect to their existence or non-existence in the AIS cluster using the enum FLTPresence. Defaults to value FLT_EXISTS and values are: FLT_EXISTS - object or bucket exists inside and/or outside cluster FLT_EXISTS_NO_PROPS - same as FLT_EXISTS but no need to return summary FLT_PRESENT - bucket is present or object is present and properly located FLT_PRESENT_NO_PROPS - same as FLT_PRESENT but no need to return summary FLT_PRESENT_CLUSTER - objects present anywhere/how in the cluster as replica, ec-slices, misplaced FLT_EXISTS_OUTSIDE - not present; exists outside clusterbsumm_remotebool - If True, returned bucket info will include remote objects as wellprefixstr - Only include objects with the given prefix in the bucket
Raises:
requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStoreValueError-flt_presenceis not one of the expected valuesaistore.sdk.errors.AISError- All other types of errors with AIStore
def copy(to_bck: Bucket,
prefix_filter: str = "",
prepend: str = "",
ext: Optional[Dict[str, str]] = None,
dry_run: bool = False,
force: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: Optional[int] = 0) -> strReturns job ID that can be used later to check the status of the asynchronous operation.
Arguments:
to_bckBucket - Destination bucketprefix_filterstr, optional - Only copy objects with names starting with this prefixprependstr, optional - Value to prepend to the name of copied objectsextDict[str, str], optional - Dict mapping each extension to the extension that will replace it (e.g. {"jpg": "txt"})dry_runbool, optional - Determines if the copy should actually happen or notforcebool, optional - Override existing destination bucketlatestbool, optional - GET the latest object version from the associated remote bucketsyncbool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) sourcenum_workersint, optional - Number of concurrent workers for the copy job per target- 0 (default): number of mountpaths
- -1: single thread, serial execution
Returns:
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def list_objects(prefix: str = "",
props: str = "",
page_size: int = 0,
uuid: str = "",
continuation_token: str = "",
flags: List[ListObjectFlag] = None,
target: str = "") -> BucketListReturns a structure that contains a page of objects, job ID, and continuation token (to read the next page, if available).
Arguments:
prefixstr, optional - Return only objects that start with the prefixpropsstr, optional - Comma-separated list of object properties to return. Default value is "name,size".Properties- "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".page_sizeint, optional - Return at most "page_size" objects. The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.NOTE- If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number of objects.uuidstr, optional - Job ID, required to get the next page of objectscontinuation_tokenstr, optional - Marks the object to start reading the next pageflagsList[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node
Returns:
BucketList- the page of objects in the bucket and the continuation token to get the next page Empty continuation token marks the final page of the object list
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def list_objects_iter(prefix: str = "",
props: str = "",
page_size: int = 0,
flags: List[ListObjectFlag] = None,
target: str = "") -> ObjectIteratorReturns an iterator for all objects in bucket
Arguments:
prefixstr, optional - Return only objects that start with the prefixpropsstr, optional - Comma-separated list of object properties to return. Default value is "name,size".Properties- "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".page_sizeint, optional - return at most "page_size" objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.NOTE- If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number objectsflagsList[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node
Returns:
ObjectIterator- object iterator
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def list_all_objects(prefix: str = "",
props: str = "",
page_size: int = 0,
flags: List[ListObjectFlag] = None,
target: str = "") -> List[BucketEntry]Returns a list of all objects in bucket
Arguments:
prefixstr, optional - return only objects that start with the prefixpropsstr, optional - comma-separated list of object properties to return. Default value is "name,size".Properties- "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".page_sizeint, optional - return at most "page_size" objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.NOTE- If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number objectsflagsList[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node
Returns:
List[BucketEntry]- list of objects in bucket
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
def list_archive(archive_obj_name: str,
include_archive_obj: bool = False,
props: str = "",
page_size: int = 0) -> List[BucketEntry]List files contained in an archived object (*.tar, *.zip, *.tgz, etc.).
This is a convenience wrapper around list_all_objects that
automatically enables the ARCH_DIR list-flag so the cluster opens
the shard and returns its directory.
Arguments:
archive_obj_namestr - Object key of the shard inside this bucket (e.g."my-archive.tar"). Can include a prefix path.include_archive_objbool, optional - IfTruethe returned list includes the parent archive object itself. WhenFalse(default) only the entries inside the shard are returned.propsstr, optional - Comma-separated list of object properties to request. Defaults to""(no properties).page_sizeint, optional - Same meaning as inlist_all_objects– how many names per internal page.
Returns:
List[BucketEntry]- Entries representing the shard (optionally) and every file stored inside it.
def transform(etl_name: str,
to_bck: Bucket,
timeout: str = DEFAULT_ETL_TIMEOUT,
prefix_filter: str = "",
prepend: str = "",
ext: Optional[Dict[str, str]] = None,
force: bool = False,
dry_run: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: Optional[int] = 0,
cont_on_err: bool = False) -> strVisits all selected objects in the source bucket and for each object, puts the transformed result to the destination bucket
Arguments:
etl_namestr - name of etl to be used for transformationsto_bckstr - destination bucket for transformationstimeoutstr, optional - Timeout of the ETL job (e.g. 5m for 5 minutes)prefix_filterstr, optional - Only transform objects with names starting with this prefixprependstr, optional - Value to prepend to the name of resulting transformed objectsextDict[str, str], optional - Dict mapping each extension to the extension that will replace it (e.g. {"jpg": "txt"})dry_runbool, optional - determines if the copy should actually happen or notforcebool, optional - override existing destination bucketlatestbool, optional - GET the latest object version from the associated remote bucketsyncbool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) sourcenum_workersint, optional - Number of concurrent workers for the transformation job per target- 0 (default): number of mountpaths
- -1: single thread, serial execution
cont_on_err- (bool): If True, continue processing objects even if some of them fail
Returns:
Job ID (as str) that can be used to check the status of the operation
def put_files(path: str,
prefix_filter: str = "",
pattern: str = "*",
basename: bool = False,
prepend: str = None,
recursive: bool = False,
dry_run: bool = False,
verbose: bool = True) -> List[str]Puts files found in a given filepath as objects to a bucket in AIS storage.
Arguments:
pathstr - Local filepath, can be relative or absoluteprefix_filterstr, optional - Only put files with names starting with this prefixpatternstr, optional - Shell-style wildcard pattern to filter filesbasenamebool, optional - Whether to use the file names only as object names and omit the path informationprependstr, optional - Optional string to use as a prefix in the object name for all objects uploaded No delimiter ("/", "-", etc.) is automatically applied between the prepend value and the object namerecursivebool, optional - Whether to recurse through the provided path directoriesdry_runbool, optional - Option to only show expected behavior without an actual put operationverbosebool, optional - Whether to print upload info to standard output
Returns:
List of object names put to a bucket in AIS
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStoreValueError- The path provided is not a valid directory
def object(obj_name: str, props: ObjectProps = None) -> ObjectFactory constructor for an object in this bucket. Does not make any HTTP request, only instantiates an object in a bucket owned by the client.
Arguments:
obj_namestr - Name of objectpropsObjectProps, optional - Properties of the object, as updated by head(), optionally pre-initialized.
Returns:
The object created.
def objects(obj_names: List = None,
obj_range: ObjectRange = None,
obj_template: str = None) -> ObjectGroupFactory constructor for multiple objects belonging to this bucket.
Arguments:
obj_nameslist - Names of objects to include in the groupobj_rangeObjectRange - Range of objects to include in the groupobj_templatestr - String template defining objects to include in the group
Returns:
The ObjectGroup created
def make_request(method: str,
action: str,
value: Dict = None,
params: Dict = None) -> requests.ResponseUse the bucket's client to make a request to the bucket endpoint on the AIS server
Arguments:
methodstr - HTTP method to use, e.g. POST/GET/DELETEactionstr - Action string used to create an ActionMsg to pass to the servervaluedict - Additional value parameter to pass in the ActionMsgparamsdict, optional - Optional parameters to pass in the request
Returns:
Response from the server
def verify_cloud_bucket()Verify the bucket provider is a cloud provider
def get_path() -> strGet the path representation of this bucket
def as_model() -> BucketModelReturn a data-model of the bucket
Returns:
BucketModel representation
def write_dataset(config: DatasetConfig, skip_missing: bool = True, **kwargs)Write a dataset to a bucket in AIS in webdataset format using wds.ShardWriter. Logs the missing attributes
Arguments:
configDatasetConfig - Configuration dict specifying how to process and store each part of the dataset itemskip_missingbool, optional - Skip samples that are missing one or more attributes, defaults to True**kwargsoptional - Optional keyword arguments to pass to the ShardWriter
class Client()AIStore client for managing buckets, objects, and ETL jobs.
Arguments:
endpointstr - AIStore endpoint.skip_verifybool, optional - If True, skip SSL certificate verification. Defaults to False.ca_certstr, optional - Path to a CA certificate file for SSL verification. If not provided, the 'AIS_CLIENT_CA' environment variable will be used. Defaults to None.client_certUnion[str, Tuple[str, str], None], optional - Path to a client certificate PEM file or a tuple (cert, key) for mTLS. If not provided, 'AIS_CRT' and 'AIS_CRT_KEY' environment variables will be used. Defaults to None.timeoutUnion[float, Tuple[float, float], None], optional - Timeout for HTTP requests.- Single float (e.g.,
5.0): Applies to both connection and read timeouts. - Tuple (e.g.,
(3.0, 20.0)): First value is the connection timeout, second is the read timeout. None: Disables timeouts (not recommended). Defaults to(3, 20).
- Single float (e.g.,
retry_configRetryConfig, optional - Defines retry behavior for HTTP and network failures. If not provided, the default retry configuration (RetryConfig.default()) is used.retryurllib3.Retry, optional - [Deprecated] Retry configuration from urllib3. Useretry_configinstead.tokenstr, optional - Authorization token. If not provided, the 'AIS_AUTHN_TOKEN' environment variable will be used. Defaults to None.max_pool_sizeint, optional - Maximum number of connections per host in the connection pool. Defaults to 10.
def bucket(bck_name: str,
provider: Union[Provider, str] = Provider.AIS,
namespace: Namespace = None)Factory constructor for bucket object. Does not make any HTTP request, only instantiates a bucket object.
Arguments:
bck_namestr - Name of bucketproviderstr or Provider - Provider of bucket, one of "ais", "aws", "gcp", ... (optional, defaults to ais)namespaceNamespace - Namespace of bucket (optional, defaults to None)
Returns:
The bucket object created.
def cluster()Factory constructor for cluster object. Does not make any HTTP request, only instantiates a cluster object.
Returns:
The cluster object created.
def job(job_id: str = "", job_kind: str = "")Factory constructor for job object, which contains job-related functions. Does not make any HTTP request, only instantiates a job object.
Arguments:
job_idstr, optional - Optional ID for interacting with a specific jobjob_kindstr, optional - Optional specific type of job empty for all kinds
Returns:
The job object created.
def etl(etl_name: str)Factory constructor for ETL object. Contains APIs related to AIStore ETL operations. Does not make any HTTP request, only instantiates an ETL object.
Arguments:
etl_namestr - Name of the ETL
Returns:
The ETL object created.
def dsort(dsort_id: str = "")Factory constructor for dSort object. Contains APIs related to AIStore dSort operations. Does not make any HTTP request, only instantiates a dSort object.
Arguments:
dsort_id- ID of the dSort job
Returns:
dSort object created
def batch_loader()Factory constructor for BatchLoader object. Contains APIs related to AIStore GetBatch operations. Does not make any HTTP requests, only creates BatchLoader.
Returns:
BatchLoader- The BatchLoader created
def fetch_object_by_url(url: str) -> ObjectDeprecated: Use get_object_from_url instead.
Creates an Object instance from a URL.
This method does not make any HTTP requests.
Arguments:
urlstr - Full URL of the object (e.g., "ais://bucket1/file.txt")
Returns:
Object- The object constructed from the specified URL
def get_object_from_url(url: str) -> ObjectCreates an Object instance from a URL.
This method does not make any HTTP requests.
Arguments:
urlstr - Full URL of the object (e.g., "ais://bucket1/file.txt")
Returns:
Object- The object constructed from the specified URL
Raises:
InvalidURLException- If the URL is invalid.
class Cluster()A class representing a cluster bound to an AIS client.
@property
def client()Client this cluster uses to make requests
def get_info() -> SmapReturns state of AIS cluster, including the detailed information about its nodes.
Returns:
aistore.sdk.types.Smap- Smap containing cluster information
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStore
def get_primary_url() -> strReturns: URL of primary proxy
def list_buckets(provider: Union[str, Provider] = Provider.AIS)Returns list of buckets in AIStore cluster.
Arguments:
providerstr or Provider, optional - Provider of bucket (one of "ais", "aws", "gcp", ...). Defaults to "ais". Empty provider returns buckets of all providers.
Returns:
List[BucketModel]- A list of buckets
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStore
def list_jobs_status(job_kind="", target_id="") -> List[JobStatus]List the status of jobs on the cluster
Arguments:
job_kindstr, optional - Only show jobs of a particular typetarget_idstr, optional - Limit to jobs on a specific target node
Returns:
List of JobStatus objects
def list_running_jobs(job_kind="", target_id="") -> List[str]List the currently running jobs on the cluster
Arguments:
job_kindstr, optional - Only show jobs of a particular typetarget_idstr, optional - Limit to jobs on a specific target node
Returns:
List of jobs in the format job_kind[job_id]
def list_etls(stages: Optional[List[str]] = None) -> List[ETLInfo]Lists ETLs filtered by their stages.
Arguments:
stagesList[str], optional - List of stages to filter ETLs by. Defaults to ["running"].
Returns:
List[ETLInfo]- A list of details on ETLs matching the specified stages
def is_ready() -> boolChecks if cluster is ready or still setting up.
Returns:
bool- True if cluster is ready, or false if cluster is still setting up
def get_performance() -> DictRetrieves the raw performance and status data from each target node in the AIStore cluster.
Returns:
Dict- A dictionary where each key is the ID of a target node and each value is the raw AIS performance/status JSON returned by that node (for more information, see https://aistore.nvidia.com/docs/monitoring-metrics#target-metrics).
Raises:
requests.RequestException- If there's an ambiguous exception while processing the requestrequests.ConnectionError- If there's a connection error with the clusterrequests.ConnectionTimeout- If the connection to the cluster times outrequests.ReadTimeout- If the timeout is reached while awaiting a response from the cluster
def get_uuid() -> strReturns: UUID of AIStore Cluster
class Job()A class containing job-related functions.
Arguments:
clientRequestClient - Client for interfacing with AIS clusterjob_idstr, optional - ID of a specific job, empty for all jobsjob_kindstr, optional - Specific kind of job, empty for all kinds
@property
def job_id()Return job id
@property
def job_kind()Return job kind
def status() -> JobStatusReturn status of a job
Returns:
The job status including id, finish time, and error info.
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStore
def wait(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT, verbose: bool = True)Wait for a job to finish
Arguments:
timeoutint, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.verbosebool, optional - Whether to log wait status to standard output
Returns:
None
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStoreerrors.Timeout- Timeout while waiting for the job to finish
def wait_for_idle(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT,
verbose: bool = True)Wait for a job to reach an idle state
Arguments:
timeoutint, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.verbosebool, optional - Whether to log wait status to standard output
Returns:
None
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStoreerrors.Timeout- Timeout while waiting for the job to finisherrors.JobInfoNotFound- Raised when information on a job's status could not be found on the AIS cluster
def wait_single_node(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT,
verbose: bool = True)Wait for a job running on a single node
Arguments:
timeoutint, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.verbosebool, optional - Whether to log wait status to standard output
Returns:
None
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStoreerrors.Timeout- Timeout while waiting for the job to finisherrors.JobInfoNotFound- Raised when information on a job's status could not be found on the AIS cluster
def start(daemon_id: str = "",
force: bool = False,
buckets: List[Bucket] = None) -> strStart a job and return its ID.
Arguments:
daemon_idstr, optional - For running a job that must run on a specific target node (e.g. resilvering).forcebool, optional - Override existing restrictions for a bucket (e.g., run LRU eviction even if the bucket has LRU disabled).bucketsList[Bucket], optional - List of one or more buckets; applicable only for jobs that have bucket scope (for details on job types, seeTablein xact/api.go).
Returns:
The running job ID.
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStore
def get_within_timeframe(start_time: datetime,
end_time: Optional[datetime] = None) -> List[JobSnap]Retrieves jobs that started after a specified start_time and optionally ended before a specified end_time.
Arguments:
start_timedatetime - The start of the timeframe for monitoring jobs.end_timedatetime, optional - The end of the timeframe for monitoring jobs.
Returns:
List[JobSnapshot]- A list of jobs that meet the specified timeframe criteria.
Raises:
JobInfoNotFound- Raised when no relevant job info is found.
def get_details() -> AggregatedJobSnapRetrieve detailed job snapshot information across all targets.
Returns:
AggregatedJobSnapshots- A snapshot containing detailed metrics for the job.
def get_total_time() -> Optional[timedelta]Calculates the total job duration as the difference between the earliest start time and the latest end time among all job snapshots. If any snapshot is missing an end_time, returns None to indicate the job is incomplete.
Returns:
Optional[timedelta]- The total duration of the job, or None if incomplete.
Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
@dataclass
class ColdGetConf()Configuration class for retrying HEAD requests to objects that are not present in cluster when attempting a cold GET.
Attributes: est_bandwidth_bps (int): Estimated bandwidth in bytes per second from the AIS cluster to backend buckets. Used to determine retry intervals for fetching remote objects. Raising this will decrease the initial time we expect object fetch to take. Defaults to 1 Gbps. max_cold_wait (int): Maximum total number of seconds to wait for an object to be present before re-raising a ReadTimeoutError to be handled by the top-level RetryConfig. Defaults to 3 minutes.
@staticmethod
def default() -> "ColdGetConf"Returns the default cold get config options.
@dataclass
class RetryConfig()Configuration class for managing both HTTP and network retries in AIStore.
AIStore implements two types of retries to ensure reliability and fault tolerance:
- HTTP Retry (urllib3.Retry) - Handles HTTP errors based on status codes (e.g., 429, 500, 502, 503, 504).
- Network Retry (tenacity) - Recovers from connection failures, timeouts, and unreachable targets.
Why two types of retries?
- AIStore uses redirects for GET/PUT operations.
- If a target node is down, we must retry the request via the proxy instead of the same failing target.
network_retryensures that the request is reattempted at the proxy level, preventing unnecessary failures.
Attributes:
http_retry (urllib3.Retry): Defines retry behavior for transient HTTP errors.
network_retry (tenacity.Retrying): Configured tenacity.Retrying instance managing retries for network-related
issues, such as connection failures, timeouts, or unreachable targets.
cold_get_conf (ColdGetConf): Configuration for retrying COLD GET requests, see ColdGetConf class.
@staticmethod
def default() -> "RetryConfig"Returns the default retry configuration for AIStore.
class ObjectGroup(AISSource)A class representing multiple objects within the same bucket. Only one of obj_names, obj_range, or obj_template should be provided.
Arguments:
bckBucket - Bucket the objects belong toobj_nameslist[str], optional - List of object names to include in this collectionobj_rangeObjectRange, optional - Range defining which object names in the bucket should be includedobj_templatestr, optional - String argument to pass as template value directly to api
@property
def client() -> RequestClientThe client bound to the bucket used by the ObjectGroup.
@client.setter
def client(client) -> RequestClientUpdate the client bound to the bucket used by the ObjectGroup.
def list_urls(prefix: str = "",
etl: Optional[ETLConfig] = None) -> Iterable[str]Implementation of the abstract method from AISSource that provides an iterator of full URLs to every object in this bucket matching the specified prefix
Arguments:
prefixstr, optional - Limit objects selected by a given string prefixetlOptional[ETLConfig], optional - An optional ETL configuration. If provided, the URLs will include ETL processing parameters. Defaults to None.
Returns:
Iterator of all object URLs in the group
def list_all_objects_iter(prefix: str = "",
props: str = "name,size") -> Iterable[Object]Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.
Arguments:
prefixstr, optional - Limit objects selected by a given string prefixpropsstr, optional - By default, will include all object properties. Pass in None to skip and avoid the extra API call.
Returns:
Iterator of all the objects in the group
def delete()Deletes a list or range of objects in a bucket
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def evict()Evicts a list or range of objects in a bucket so that they are no longer cached in AIS NOTE: only Cloud buckets can be evicted.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def prefetch(blob_threshold: int = None,
num_workers: int = None,
latest: bool = False,
continue_on_error: bool = False)Prefetches a list or range of objects in a bucket so that they are cached in AIS NOTE: only Cloud buckets can be prefetched.
Arguments:
latestbool, optional - GET the latest object version from the associated remote bucketcontinue_on_errorbool, optional - Whether to continue if there is an error prefetching a single objectblob_thresholdint, optional - Utilize built-in blob-downloader for remote objects greater than the specified (threshold) size in bytesnum_workersint, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def copy(to_bck: "Bucket",
prepend: str = "",
continue_on_error: bool = False,
dry_run: bool = False,
force: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: int = None) -> List[str]Copies a list or range of objects in a bucket
Arguments:
to_bckBucket - Destination bucketprependstr, optional - Value to prepend to the name of copied objectscontinue_on_errorbool, optional - Whether to continue if there is an error copying a single objectdry_runbool, optional - Skip performing the copy and just log the intended actionsforcebool, optional - Force this job to run over others in case it conflicts (see "limited coexistence" and xact/xreg/xreg.go)latestbool, optional - GET the latest object version from the associated remote bucketsyncbool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) sourcenum_workersint, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
Returns:
List[str]- List of job IDs that can be used to check the status of the operation
def transform(to_bck: "Bucket",
etl_name: str,
timeout: str = DEFAULT_ETL_TIMEOUT,
prepend: str = "",
ext: Dict[str, str] = None,
continue_on_error: bool = False,
dry_run: bool = False,
force: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: int = None)Performs ETL operation on a list or range of objects in a bucket, placing the results in the destination bucket
Arguments:
to_bckBucket - Destination bucketetl_namestr - Name of existing ETL to applytimeoutstr - Timeout of the ETL job (e.g. 5m for 5 minutes)prependstr, optional - Value to prepend to the name of resulting transformed objectsextDict[str, str], optional - Dict mapping each extension to the extension that will replace it (i.e. {"jpg": "txt"})continue_on_errorbool, optional - Whether to continue if there is an error transforming a single objectdry_runbool, optional - Skip performing the transform and just log the intended actionsforcebool, optional - Force this job to run over others in case it conflicts (see "limited coexistence" and xact/xreg/xreg.go)latestbool, optional - GET the latest object version from the associated remote bucketsyncbool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) sourcenum_workersint, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def archive(archive_name: str,
mime: str = "",
to_bck: "Bucket" = None,
include_source_name: bool = False,
allow_append: bool = False,
continue_on_err: bool = False) -> List[str]Create or append to an archive
Arguments:
archive_namestr - Name of archive to create or appendmimestr, optional - MIME type of the contentto_bckBucket, optional - Destination bucket, defaults to current bucketinclude_source_namebool, optional - Include the source bucket name in the archived objects' namesallow_appendbool, optional - Allow appending to an existing archivecontinue_on_errbool, optional - Whether to continue if there is an error archiving a single object
Returns:
List[str]- List of job IDs that can be used to check the status of the operation
def list_names() -> List[str]List all the object names included in this group of objects
Returns:
List of object names
class ObjectNames(ObjectCollection)A collection of object names, provided as a list of strings
Arguments:
namesList[str] - A list of object names
class ObjectRange(ObjectCollection)Class representing a range of object names
Arguments:
prefixstr - Prefix contained in all names of objectsmin_indexint - Starting index in the name of objectsmax_indexint - Last index in the name of all objectspad_widthint, optional - Left-pad indices with zeros up to the width provided, e.g. pad_width = 3 will transform 1 to 001stepint, optional - Size of iterator steps between each itemsuffixstr, optional - Suffix at the end of all object names
@classmethod
def from_string(cls, range_string: str)Construct an ObjectRange instance from a valid range string like 'input-{00..99..1}.txt'
Arguments:
range_stringstr - The range string to parse
Returns:
ObjectRange- An instance of the ObjectRange class
class ObjectTemplate(ObjectCollection)A collection of object names specified by a template in the bash brace expansion format
Arguments:
templatestr - A string template that defines the names of objects to include in the collection
@dataclass
class BucketDetails()Metadata about a bucket, used by objects within that bucket.
class Object()Provides methods for interacting with an object in AIS.
Arguments:
clientRequestClient - Client used for all http requests.bck_detailsBucketDetails - Metadata about the bucket to which this object belongs.namestr - Name of the object.propsObjectProps, optional - Properties of the object, as updated by head(), optionally pre-initialized.
@property
def bucket_name() -> strName of the bucket where this object resides.
@property
def bucket_provider() -> ProviderProvider of the bucket where this object resides (e.g. ais, s3, gcp).
@property
def query_params() -> Dict[str, str]Query params used as a base for constructing all requests for this object.
@property
def name() -> strName of this object.
@property
def uname() -> strUnified name (uname) of this object, which combines the bucket path and object name.
Returns:
str- The unified name in the format bucket_path/object_name
@property
def props() -> ObjectPropsGet the latest properties of the object.
This will make a HEAD request to the AIStore cluster to fetch up-to-date object headers
and refresh the internal _props cache. Use this when you want to ensure you're accessing
the most recent metadata for the object.
Returns:
ObjectProps- The latest object properties from the server.
@property
def props_cached() -> Optional[ObjectProps]Get the cached object properties (without making a network call).
This is useful when:
- You want to avoid a network request.
- You're sure the cached
_propswas already set via a previous call tohead()or during object construction.
Returns:
ObjectProps or None: Cached object properties, or None if not set.
def head() -> CaseInsensitiveDictRequests object properties and returns headers. Updates props.
Returns:
Response header with the object properties.
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStorerequests.exceptions.HTTPError(404)- The object does not exist
def get_reader(archive_config: Optional[ArchiveConfig] = None,
blob_download_config: Optional[BlobDownloadConfig] = None,
chunk_size: int = DEFAULT_CHUNK_SIZE,
etl: Optional[ETLConfig] = None,
writer: Optional[BufferedWriter] = None,
latest: bool = False,
byte_range: Optional[str] = None,
direct: bool = False) -> ObjectReaderCreates and returns an ObjectReader with access to object contents and optionally writes to a provided writer.
Arguments:
archive_configOptional[ArchiveConfig] - Settings for archive extraction.blob_download_configOptional[BlobDownloadConfig] - Settings for using blob download.chunk_sizeint, optional - Chunk size to use while reading from stream.etlOptional[ETLConfig] - Settings for ETL-specific operations (name, args).writerOptional[BufferedWriter] - User-provided writer for writing content output. The user is responsible for closing the writer.latestbool, optional - GET the latest object version from the associated remote bucket.byte_rangeOptional[str] - Byte range in RFC 7233 format for single-range requests (e.g., "bytes=0-499", "bytes=500-", "bytes=-500").See- https://www.rfc-editor.org/rfc/rfc7233#section-2.1.directbool, optional - If True, the object content is read directly from the target node, bypassing the proxy.
Returns:
ObjectReader- An iterator for streaming object content.
Raises:
ValueError- If Byte Range is used with Blob Download.requests.RequestException- If an error occurs during the request.requests.ConnectionError- If there is a connection error.requests.ConnectionTimeout- If the connection times out.requests.ReadTimeout- If the read operation times out.
def get(archive_config: ArchiveConfig = None,
blob_download_config: BlobDownloadConfig = None,
chunk_size: int = DEFAULT_CHUNK_SIZE,
etl: ETLConfig = None,
writer: BufferedWriter = None,
latest: bool = False,
byte_range: str = None) -> ObjectReaderDeprecated: Use 'get_reader' instead.
Creates and returns an ObjectReader with access to object contents and optionally writes to a provided writer.
Arguments:
archive_configArchiveConfig, optional - Settings for archive extraction.blob_download_configBlobDownloadConfig, optional - Settings for using blob download.chunk_sizeint, optional - Chunk size to use while reading from stream.etlETLConfig, optional - Settings for ETL-specific operations (name, meta).writerBufferedWriter, optional - User-provided writer for writing content output. The user is responsible for closing the writer.latestbool, optional - GET the latest object version from the associated remote bucket.byte_rangestr, optional - Byte range in RFC 7233 format for single-range requests (e.g., "bytes=0-499", "bytes=500-", "bytes=-500").See- https://www.rfc-editor.org/rfc/rfc7233#section-2.1.
Returns:
ObjectReader- An ObjectReader that can be iterated over to stream chunks of object content or used to read all content directly.
Raises:
ValueError- If Byte Range is used with Blob Download.requests.RequestException- If an error occurs during the request.requests.ConnectionError- If there is a connection error.requests.ConnectionTimeout- If the connection times out.requests.ReadTimeout- If the read operation times out.
def get_semantic_url() -> strGet the semantic URL to the object
Returns:
Semantic URL to get object
def get_url(archpath: str = "", etl: ETLConfig = None) -> strGet the full url to the object including base url and any query parameters
Arguments:
archpathstr, optional - If the object is an archive, usearchpathto extract a single file from the archiveetlETLConfig, optional - Settings for ETL-specific operations (name, meta).
Returns:
Full URL to get object
def put_content(content: bytes) -> ResponseDeprecated: Use 'ObjectWriter.put_content' instead.
Puts bytes as an object to a bucket in AIS storage.
Arguments:
contentbytes - Bytes to put as an object.
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStore
def put_file(path: str or Path) -> ResponseDeprecated: Use 'ObjectWriter.put_file' instead.
Puts a local file as an object to a bucket in AIS storage.
Arguments:
pathstr or Path - Path to local file
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStoreValueError- The path provided is not a valid file
def get_writer() -> ObjectWriterCreate an ObjectWriter to write to object contents and attributes.
Returns:
An ObjectWriter which can be used to write to an object's contents and attributes.
def promote(path: str,
target_id: str = "",
recursive: bool = False,
overwrite_dest: bool = False,
delete_source: bool = False,
src_not_file_share: bool = False) -> strPromotes a file or folder an AIS target can access to a bucket in AIS storage. These files can be either on the physical disk of an AIS target itself or on a network file system the cluster can access. See more info here: https://aiatscale.org/blog/2022/03/17/promote
Arguments:
pathstr - Path to file or folder the AIS cluster can reachtarget_idstr, optional - Promote files from a specific target noderecursivebool, optional - Recursively promote objects from files in directories inside the pathoverwrite_destbool, optional - Overwrite objects already on AISdelete_sourcebool, optional - Delete the source files when done promotingsrc_not_file_sharebool, optional - Optimize if the source is guaranteed to not be on a file share
Returns:
Job ID (as str) that can be used to check the status of the operation, or empty if job is done synchronously
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStoreAISError- Path does not exist on the AIS cluster storage
def delete() -> ResponseDelete an object from a bucket.
Returns:
None
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStorerequests.exceptions.HTTPError(404)- The object does not exist
def copy(to_obj: "Object", etl: Optional[ETLConfig] = None) -> ResponseCopy this object to another object (which specifies the destination bucket and name), optionally with ETL transformation.
Arguments:
to_objObject - Destination object specifying both the target bucket and object nameetlETLConfig, optional - ETL configuration for transforming the object during copy
Returns:
Response- The response from the copy operation
Raises:
requests.RequestException- "There's an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStorerequests.exceptions.HTTPError- Service unavailable
def blob_download(chunk_size: int = None,
num_workers: int = None,
latest: bool = False) -> strA special facility to download very large remote objects a.k.a. BLOBs Returns job ID that for the blob download operation.
Arguments:
chunk_sizeint - chunk size in bytesnum_workersint - number of concurrent blob-downloading workers (readers)latestbool - GET the latest object version from the associated remote bucket
Returns:
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError- All other types of errors with AIStorerequests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.exceptions.HTTPError- Service unavailablerequests.RequestException- "There was an ambiguous exception that occurred while handling..."
def append_content(content: bytes,
handle: str = "",
flush: bool = False) -> strDeprecated: Use 'ObjectWriter.append_content' instead.
Append bytes as an object to a bucket in AIS storage.
Arguments:
contentbytes - Bytes to append to the object.handlestr - Handle string to use for subsequent appends or flush (empty for the first append).flushbool - Whether to flush and finalize the append operation, making the object accessible.
Returns:
handlestr - Handle string to pass for subsequent appends or flush.
Raises:
requests.RequestException- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError- Connection errorrequests.ConnectionTimeout- Timed out connecting to AIStorerequests.ReadTimeout- Timed out waiting response from AIStorerequests.exceptions.HTTPError(404)- The object does not exist
def set_custom_props(custom_metadata: Dict[str, str],
replace_existing: bool = False) -> ResponseDeprecated: Use 'ObjectWriter.set_custom_props' instead.
Set custom properties for the object.
Arguments:
custom_metadataDict[str, str] - Custom metadata key-value pairs.replace_existingbool, optional - Whether to replace existing metadata. Defaults to False.
class ObjectReader()Provide a way to read an object's contents and attributes, optionally iterating over a stream of content.
Arguments:
object_clientObjectClient - Client for making requests to a specific object in AISchunk_sizeint, optional - Size of each data chunk to be fetched from the stream. Defaults to DEFAULT_CHUNK_SIZE.
def head() -> ObjectAttributesMake a head request to AIS to update and return only object attributes.
Returns:
ObjectAttributes containing metadata for this object.
@property
def attributes() -> ObjectAttributesObject metadata attributes.
Returns:
ObjectAttributes- Parsed object attributes from the headers returned by AIS.
def read_all() -> bytesRead all byte data directly from the object response without using a stream.
This requires all object content to fit in memory at once and downloads all content before returning.
Returns:
bytes- Object content as bytes.
def raw() -> requests.ResponseReturn the raw byte stream of object content.
Returns:
requests.Response- Raw byte stream of the object content.
def as_file(buffer_size: Optional[int] = None,
max_resume: Optional[int] = 5) -> BufferedIOBaseCreate a read-only, non-seekable ObjectFileReader instance for streaming object data in chunks.
This file-like object primarily implements the read() method to retrieve data sequentially,
with automatic retry/resumption in case of unexpected stream interruptions (e.g. ChunkedEncodingError,
ConnectionError) or timeouts (e.g. ReadTimeout).
Arguments:
buffer_sizeint, optional - Currently unused; retained for backward compatibility and future enhancements.max_resumeint, optional - Total number of retry attempts allowed to resume the stream in case of interruptions. Defaults to 5.
Returns:
BufferedIOBase- A read-only, non-seekable file-like object for streaming object content.
Raises:
ValueError- Ifmax_resumeis invalid (must be a non-negative integer).
def __iter__() -> Generator[bytes, None, None]Make a request to get a stream from the provided object and yield chunks of the stream content.
Returns:
Generator[bytes, None, None]: An iterator over each chunk of bytes in the object.
class ObjectFileReader(BufferedIOBase)A sequential read-only file-like object extending BufferedIOBase for reading object data, with support for both
reading a fixed size of data and reading until the end of file (EOF).
When a read is requested, any remaining data from a previously fetched chunk is returned first. If the remaining
data is insufficient to satisfy the request, the read() method fetches additional chunks from the provided
iterator as needed, until the requested size is fulfilled or the end of the stream is reached.
In case of unexpected stream interruptions (e.g. ChunkedEncodingError, ConnectionError) or timeouts (e.g.
ReadTimeout), the read() method automatically retries and resumes fetching data from the last successfully
retrieved chunk. The max_resume parameter controls how many retry attempts are made before an error is raised.
Arguments:
content_providerContentIterProvider - A provider that creates iterators which can fetch object data from AIS in chunks.max_resumeint - Maximum number of resumes allowed for an ObjectFileReader instance.
@override
def readable() -> boolReturn whether the file is readable.
@override
def read(size: Optional[int] = -1) -> bytesRead up to 'size' bytes from the object. If size is -1, read until the end of the stream.
Arguments:
sizeint, optional - The number of bytes to read. If -1, reads until EOF.
Returns:
bytes- The read data as a bytes object.
Raises:
ObjectFileReaderStreamError- If a connection cannot be made.ObjectFileReaderMaxResumeError- If the stream is interrupted more than the allowed maximum.ValueError- I/O operation on a closed file.Exception- Any other errors while streaming and reading.
@override
def close() -> NoneClose the file.
class ObjectFileWriter(BufferedWriter)A file-like writer object for AIStore, extending BufferedWriter.
Arguments:
obj_writerObjectWriter - The ObjectWriter instance for handling write operations.modestr - Specifies the mode in which the file is opened.'w': Write mode. Opens the object for writing, truncating any existing content. Writing starts from the beginning of the object.'a': Append mode. Opens the object for appending. Existing content is preserved, and writing starts from the end of the object.
@override
def write(buffer: bytes) -> intWrite data to the object.
Arguments:
databytes - The data to write.
Returns:
int- Number of bytes written.
Raises:
ValueError- I/O operation on a closed file.
@override
def flush() -> NoneFlush the writer, ensuring the object is finalized.
This does not close the writer but makes the current state accessible.
Raises:
ValueError- I/O operation on a closed file.
@override
def close() -> NoneClose the writer and finalize the object.
class ObjectProps(ObjectAttributes)Represents the attributes parsed from the response headers returned from an API call to get an object. Extends ObjectAtributes and is a superset of that class.
Arguments:
response_headersCaseInsensitiveDict, optional - Response header dict containing object attributes
@property
def bucket_name()Name of object's bucket
@property
def bucket_provider()Provider of object's bucket.
@property
def name() -> strName of the object.
@property
def location() -> strLocation of the object.
@property
def mirror_paths() -> List[str]List of mirror paths.
@property
def mirror_copies() -> intNumber of mirror copies.
@property
def present() -> boolTrue if object is present in cluster.
class ObjectAttributes()Represents the attributes parsed from the response headers returned from an API call to get an object.
Arguments:
response_headersCaseInsensitiveDict - Response header dict containing object attributes
@property
def size() -> intSize of object content.
@property
def checksum_type() -> strType of checksum, e.g. xxhash or md5.
@property
def checksum_value() -> strChecksum value.
@property
def access_time() -> strTime this object was accessed.
@property
def obj_version() -> strObject version.
@property
def custom_metadata() -> Dict[str, str]Dictionary of custom metadata.
@property
def present() -> boolWhether the object is present/cached.