Buckets API Reference¶

Buckets¶

class apolo_sdk.Buckets¶

Blob storage buckets subsystems, available as Client.buckets.

The subsystem helps take advantage of many basic functionality of Blob Storage solutions different cloud providers support. For AWS it would be S3, for GCP - Cloud Storage, etc.

async list(cluster_name: str | None = None) → AsyncContextManager[AsyncIterator[Bucket]][source]¶

List user’s buckets, async iterator. Yields Bucket instances.

Parameters:: cluster_name (str) – cluster to list buckets. Default is current cluster.

async create(name: Optional[str], cluster_name: str | None = None, org_name: str | None = None) → Bucket[source]¶

Create a new bucket.

Parameters:

name (Optional[str]) – Name of the bucket. Should be unique among all user’s bucket.
cluster_name (str) – cluster to create a bucket. Default is current cluster.
org_name (str) – org to create a bucket. Default is current org.

Returns:

Newly created bucket info (Bucket)

async import_external(provider: Bucket.Provider, provider_bucket_name: str, credentials: Mapping[str, str], name: str | None = None, cluster_name: str | None = None, org_name: str | None = None) → Bucket[source]¶

Import a new bucket.

Parameters:

provider (Bucket.Provider) – Provider type of imported bucket.
provider_bucket_name (str) – Name of external bucket inside the provider.
credentials (Mapping[str, str]) – Raw credentials to access bucket provider.
name (Optional[str]) – Name of the bucket. Should be unique among all user’s bucket.
cluster_name (str) – cluster to import a bucket. Default is current cluster.
org_name (str) – org to import a bucket. Default is current org.

Returns:

Newly imported bucket info (Bucket)

async get(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → Bucket[source]¶

Get a bucket with id or name bucket_id_or_name.

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

Bucket info (Bucket)

async rm(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → None[source]¶

Delete a bucket with id or name bucket_id_or_name.

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

async request_tmp_credentials(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → BucketCredentials[source]¶

Get a temporary provider credentials to bucket with id or name bucket_id_or_name.

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

Bucket credentials info (BucketCredentials)

async set_public_access(bucket_id_or_name: str, public_access: bool, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → Bucket[source]¶

Enable or disable public (anonymous) read access to bucket.

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
public_access (str) – New public access setting.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

Bucket info (Bucket)

async head_blob(bucket_id_or_name: str, key: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → BucketEntry[source]¶

Look up the blob and return it’s metadata.

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
key (str) – key of the blob.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

BucketEntry object.

Raises:

ResourceNotFound if key does not exist.

async put_blob(bucket_id_or_name: str, key: str, body: Union[AsyncIterator[bytes], bytes], cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None, ) → None[source]¶

Create or replace blob identified by key in the bucket, e.g:

large_file = Path("large_file.dat")
size = large_file.stat().st_size
file_md5 = await calc_md5(large_file)

async def body_stream():
    with large_file.open("r") as f:
        for line in f:
            yield f

await client.buckets.put_blob(
    bucket_id_or_name="my_bucket",
    key="large_file.dat",
    body=body_stream,
)

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
key (str) – Key of the blob.
body (bytes) – Body of the blob. Can be passed as either bytes or as an AsyncIterator[bytes].
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

async fetch_blob(bucket_id_or_name: str, key: str, offset: int = 0, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → AsyncIterator[bytes][source]¶

Look up the blob and return it’s body content only. The content will be streamed using an asynchronous iterator, e.g.:

async with client.buckets.fetch_blob("my_bucket", key="file.txt") as content:
    async for data in content:
        print("Next chunk of data:", data)

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
key (str) – Key of the blob.
offset (int) – Position in blob from which to read.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

async delete_blob(bucket_id_or_name: str, key: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → None[source]¶

Remove blob from the bucket.

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
key (str) – key of the blob.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

async list_blobs(uri: URL, recursive: bool = False, limit: int = 10000) → AsyncContextManager[AsyncIterator[BucketEntry]][source]¶

List blobs in the bucket. You can filter by prefix and return results similar to a folder structure if recursive=False is provided.

Parameters:

uri (URL) – URL that specifies bucket and prefix to list blobs, e.g. yarl.URL("blob:bucket_name/path/in/bucket").
bool (recursive) – If True listing will contain all keys filtered by prefix, while with False only ones up to next / will be returned. To indicate missing keys, all that were listed will be combined under a common prefix and returned as BlobCommonPrefix.
int (limit) – Maximum number of BucketEntry objects returned.

async glob_blobs(uri: URL) → AsyncContextManager[AsyncIterator[BucketEntry]][source]¶

Glob search for blobs in the bucket:

async with client.buckets.glob_blobs(
    uri=URL("blob:my_bucket/folder1/**/*.txt")
) as blobs:
    async for blob in blobs:
        print(blob.key)

Similar to Storage.glob() the “**” pattern means “this directory and all sub-directories, recursively”.

Parameters:: uri (URL) – URL that specifies bucket and pattern to glob blobs, e.g. yarl.URL("blob:bucket_name/path/**/*.bin").

async upload_file(src: URL, dst: URL, *, update: bool = False, progress: AbstractFileProgress | None = None) → None:[source]¶

Similarly to Storage.upload_file(), allows to upload local file src to bucket URL dst.

Parameters:

src (URL) – path to uploaded file on local disk, e.g. yarl.URL("file:///home/andrew/folder/file.txt").
dst (URL) – URL that specifies bucket and key to upload file e.g. yarl.URL("blob:bucket_name/folder/file.txt").
update (bool) – if true, upload only when the source file is newer than the destination file or when the destination file is missing.
progress (AbstractFileProgress) – a callback interface for reporting uploading progress, None for no progress report (default).

async download_file(src: URL, dst: URL, *, update: bool = False, continue_: bool = False, progress: AbstractFileProgress | None = None) → None:[source]¶

Similarly to Storage.download_file(), allows to download remote file src to local path dst.

Parameters:

src (URL) – URL that specifies bucket and blob key to download e.g. yarl.URL("blob:bucket_name/folder/file.bin").
dst (URL) – local path to save downloaded file, e.g. yarl.URL("file:///home/andrew/folder/file.bin").
update (bool) – if true, download only when the source file is newer than the destination file or when the destination file is missing.
continue (bool) – if true, download only the part of the source file past the end of the destination file and append it to the destination file if the destination file is newer and not longer than the source file. Otherwise download and overwrite the whole file.
progress (AbstractFileProgress) – a callback interface for reporting downloading progress, None for no progress report (default).

async upload_dir(src: URL, dst: URL, *, update: bool = False, filter: Callable[[str], Awaitable[bool]] | None = None, ignore_file_names: AbstractSet[str] = frozenset(), progress: AbstractRecursiveFileProgress | None = None) → None:[source]¶

Similarly to Storage.upload_dir(), allows to recursively upload local directory src to Blob Storage URL dst.

Parameters:

src (URL) – path to uploaded directory on local disk, e.g. yarl.URL("file:///home/andrew/folder").
dst (URL) – path on Blob Storage for saving uploading directory e.g. yarl.URL("blob:bucket_name/folder/").
update (bool) – if true, download only when the source file is newer than the destination file or when the destination file is missing.
filter (Callable[[str], Awaitable[bool]]) – a callback function for determining which files and subdirectories be uploaded. It is called with a relative path of file or directory and if the result is false the file or directory will be skipped.
ignore_file_names (AbstractSet[str]) – a set of names of files which specify filters for skipping files and subdirectories. The format of ignore files is the same as .gitignore.
progress (AbstractRecursiveFileProgress) – a callback interface for reporting uploading progress, None for no progress report (default).

async download_dir(src: URL, dst: URL, *, update: bool = False, continue_: bool = False, filter: Callable[[str], Awaitable[bool]] | None = None, progress: AbstractRecursiveFileProgress | None = None) → None:[source]¶

Similarly to Storage.download_dir(), allows to recursively download remote directory src to local path dst.

Parameters:

src (URL) – path on Blob Storage to download a directory from e.g. yarl.URL("blob:bucket_name/folder/").
dst (URL) – local path to save downloaded directory, e.g. yarl.URL("file:///home/andrew/folder").
update (bool) – if true, download only when the source file is newer than the destination file or when the destination file is missing.
continue (bool) – if true, download only the part of the source file past the end of the destination file and append it to the destination file if the destination file is newer and not longer than the source file. Otherwise download and overwrite the whole file.
filter (Callable[[str], Awaitable[bool]]) – a callback function for determining which files and subdirectories be downloaded. It is called with a relative path of file or directory and if the result is false the file or directory will be skipped.
progress (AbstractRecursiveFileProgress) – a callback interface for reporting downloading progress, None for no progress report (default).

async blob_is_dir(uri: URL) → bool[source]¶

Check weather uri specifies a “folder” blob in a bucket.

Parameters:: src (URL) – URL that specifies bucket and blob key e.g. yarl.URL("blob:bucket_name/folder/sub_folder").

async blob_rm(uri: URL, *, recursive: bool = False, progress: AbstractDeleteProgress | None = None) → None[source]¶

Remove blobs from bucket.

Parameters:

uri (URL) – URL that specifies bucket and blob key e.g. yarl.URL("blob:bucket_name/folder/sub_folder").
recursive (bool) – remove a directory recursively with all nested files and folders if True (False by default).
progress (AbstractDeleteProgress) – a callback interface for reporting delete progress, None for no progress report (default).

Raises:

IsADirectoryError if uri points on a directory and recursive flag is not set.

async make_signed_url(uri: URL, expires_in_seconds: int = 3600) → URL[source]¶

Generate a singed url that allows temporary access to blob.

Parameters:

uri (URL) – URL that specifies bucket and blob key e.g. yarl.URL("blob:bucket_name/folder/file.bin").
expires_in_seconds (int) – Duration in seconds generated url will be valid.

Returns:

Signed url (yarl.URL)

async get_disk_usage(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) → AsyncContextManager[AsyncIterator[BucketUsage]][source]¶

Get disk space usage of a given bucket. Iterator yield partial results as calculation for the whole bucket can take time.

Parameters:

bucket_id_or_name (str) – bucket’s id or name.
cluster_name (str) – cluster to look for a bucket. Default is current cluster.
bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

async persistent_credentials_list(cluster_name: str | None = None) → AsyncContextManager[AsyncIterator[PersistentBucketCredentials]][source]¶

List user’s bucket persistent credentials, async iterator. Yields PersistentBucketCredentials instances.

Parameters:: cluster_name (str) – cluster to list persistent credentials. Default is current cluster.

async persistent_credentials_create(bucket_ids: Iterable[str], name: Optional[str], read_only: bool | None = False, cluster_name: str | None = None) → PersistentBucketCredentials[source]¶

Create a new persistent credentials for given set of buckets.

Parameters:

bucket_ids (Iterable[str]) – Iterable of bucket ids to create credentials for.
name (Optional[str]) – Name of the persistent credentials. Should be unique among all user’s bucket persistent credentials.
read_only (str) – Allow only read-only access using created credentials. False by default.
cluster_name (str) – cluster to create a persistent credentials. Default is current cluster.

Returns:

Newly created credentials info (PersistentBucketCredentials)

async persistent_credentials_get(credential_id_or_name: str, cluster_name: str | None = None) → PersistentBucketCredentials[source]¶

Get a persistent credentials with id or name credential_id_or_name.

Parameters:

credential_id_or_name (str) – persistent credentials’s id or name.
cluster_name (str) – cluster to look for a persistent credentials. Default is current cluster.

Returns:

Credentials info (PersistentBucketCredentials)

async persistent_credentials_rm(credential_id_or_name: str, cluster_name: str | None = None) → None[source]¶

Delete a persistent credentials with id or name credential_id_or_name.

Parameters:

credential_id_or_name (str) – persistent credentials’s id or name.
cluster_name (str) – cluster to look for a persistent credentials. Default is current cluster.

Bucket¶

class apolo_sdk.Bucket¶

Read-only dataclass for describing single bucket.

id¶: The bucket id, str.

owner¶: The bucket owner username, str.

name¶: The bucket name set by user, unique among all user’s buckets, str or None if no name was set.

uri¶: URI of the bucket resource, yarl.URL.

cluster_name¶: Cluster this bucket belongs to, str.

org_name¶: Org this bucket belongs to, str or None if there is no such org.

created_at¶: Bucket creation timestamp, datetime.

provider¶: Blob storage provider this bucket belongs to, Bucket.Provider.

BucketCredentials¶

class apolo_sdk.BucketCredentials¶

Read-only dataclass for describing credentials to single bucket.

bucket_id¶: The bucket id, str.

provider¶: Blob storage provider this bucket belongs to, Bucket.Provider.

credentials¶: Raw credentials to access a bucket inside the provider, Mapping[str, str]

Bucket.Provider¶

class Bucket.Provider[source]¶

Enumeration that describes bucket providers.

Can be one of the following values:

AWS¶: Amazon Web Services S3 bucket

MINIO¶: Minio S3 bucket

AZURE¶: Azure blob storage container

PersistentBucketCredentials¶

class apolo_sdk.PersistentBucketCredentials¶

Read-only dataclass for describing persistent credentials to some set of buckets created after user request.

id¶: The credentials id, str.

owner¶: The credentials owner username, str.

name¶: The credentials name set by user, unique among all user’s bucket credentials, str or None if no name was set.

read_only¶: The credentials provide read-only access to buckets, bool.

cluster_name¶: Cluster this credentials belongs to, str.

credentials¶: List of per bucket credentials, List[BucketCredentials]

BucketEntry¶

class apolo_sdk.BucketEntry¶

An abstract class dataclass for describing bucket contents entries.

key¶: Key of the blob, str.

bucket¶: Containing bucket, Bucket.

size¶: Size of the data in bytes, int.

created_at¶: Blob creation timestamp, datetime or None if underlying blob engine do not store such information

modified_at¶: Blob modification timestamp, datetime or None if underlying blob engine do not store such information

uri¶: URI identifying the blob, URL, e.g. blob://cluster_name/username/my_bucket/file.txt.

name¶: Name of blob, part of key after last /, str

is_dir(uri: URL) → bool[source]¶: True if entry is directory blob object

is_file(uri: URL) → bool[source]¶: True if entry is file blob object

BlobObject¶

class apolo_sdk.BlobObject¶: An ancestor of BucketEntry used for key that are present directly in underlying blob storage.

BlobCommonPrefix¶

class apolo_sdk.BlobCommonPrefix¶: An ancestor of BucketEntry for describing common prefixes for blobs in non-recursive listing. You can treat it as a kind of folder on Blob Storage.

BucketUsage¶

class apolo_sdk.BucketUsage¶

An dataclass for describing bucket disk space usage.

total_bytes¶: Total size of all objects in bytes, int.

object_count¶: Total number of objects, int.

Buckets API Reference¶

Buckets¶

Bucket¶

BucketCredentials¶

Bucket.Provider¶

PersistentBucketCredentials¶

BucketEntry¶

BlobObject¶

BlobCommonPrefix¶

BucketUsage¶

apolo-sdk

Navigation

Related Topics