Buckets API Reference

Buckets

class neuro_sdk.Buckets

Blob storage buckets subsystems, available as Client.buckets.

The subsystem helps take advantage of many basic functionality of Blob Storage solutions different cloud providers support. For AWS it would be S3, for GCP - Cloud Storage, etc.

async-for list(cluster_name: Optional[str] = None) AsyncContextManager[AsyncIterator[Bucket]][source]

List user’s buckets, async iterator. Yields Bucket instances.

Parameters:

cluster_name (str) – cluster to list buckets. Default is current cluster.

coroutine create(name: Optional[str], cluster_name: Optional[str] = None, org_name: Optional[str] = None) Bucket[source]

Create a new bucket.

Parameters:
  • name (Optional[str]) – Name of the bucket. Should be unique among all user’s bucket.

  • cluster_name (str) – cluster to create a bucket. Default is current cluster.

  • org_name (str) – org to create a bucket. Default is current org.

Returns:

Newly created bucket info (Bucket)

coroutine import_external(provider: Bucket.Provider, provider_bucket_name: str, credentials: Mapping[str, str], name: Optional[str] = None, cluster_name: Optional[str] = None, org_name: Optional[str] = None) Bucket[source]

Import a new bucket.

Parameters:
  • provider (Bucket.Provider) – Provider type of imported bucket.

  • provider_bucket_name (str) – Name of external bucket inside the provider.

  • credentials (Mapping[str, str]) – Raw credentials to access bucket provider.

  • name (Optional[str]) – Name of the bucket. Should be unique among all user’s bucket.

  • cluster_name (str) – cluster to import a bucket. Default is current cluster.

  • org_name (str) – org to import a bucket. Default is current org.

Returns:

Newly imported bucket info (Bucket)

coroutine get(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) Bucket[source]

Get a bucket with id or name bucket_id_or_name.

Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

Bucket info (Bucket)

coroutine rm(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) None[source]

Delete a bucket with id or name bucket_id_or_name.

Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

coroutine request_tmp_credentials(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) BucketCredentials[source]

Get a temporary provider credentials to bucket with id or name bucket_id_or_name.

Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

Bucket credentials info (BucketCredentials)

coroutine set_public_access(bucket_id_or_name: str, public_access: bool, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) Bucket[source]

Enable or disable public (anonymous) read access to bucket.

Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • public_access (str) – New public access setting.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

Bucket info (Bucket)

coroutine head_blob(bucket_id_or_name: str, key: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) BucketEntry[source]

Look up the blob and return it’s metadata.

Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • key (str) – key of the blob.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

Returns:

BucketEntry object.

Raises:

ResourceNotFound if key does not exist.

coroutine put_blob(bucket_id_or_name: str,                         key: str,                         body: Union[AsyncIterator[bytes], bytes],                        cluster_name: Optional[str] = None,                         bucket_owner: Optional[str) = None,                      ) None[source]

Create or replace blob identified by key in the bucket, e.g:

large_file = Path("large_file.dat")
size = large_file.stat().st_size
file_md5 = await calc_md5(large_file)

async def body_stream():
    with large_file.open("r") as f:
        for line in f:
            yield f

await client.buckets.put_blob(
    bucket_id_or_name="my_bucket",
    key="large_file.dat",
    body=body_stream,
)
Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • key (str) – Key of the blob.

  • body (bytes) – Body of the blob. Can be passed as either bytes or as an AsyncIterator[bytes].

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

coroutine fetch_blob(bucket_id_or_name: str, key: str, offset: int = 0, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) AsyncIterator[bytes][source]

Look up the blob and return it’s body content only. The content will be streamed using an asynchronous iterator, e.g.:

async with client.buckets.fetch_blob("my_bucket", key="file.txt") as content:
    async for data in content:
        print("Next chunk of data:", data)
Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • key (str) – Key of the blob.

  • offset (int) – Position in blob from which to read.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

coroutine delete_blob(bucket_id_or_name: str, key: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) None[source]

Remove blob from the bucket.

Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • key (str) – key of the blob.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

coroutine list_blobs(uri: URL, recursive: bool = False, limit: int = 10000) AsyncContextManager[AsyncIterator[BucketEntry]][source]

List blobs in the bucket. You can filter by prefix and return results similar to a folder structure if recursive=False is provided.

Parameters:
  • uri (URL) – URL that specifies bucket and prefix to list blobs, e.g. yarl.URL("blob:bucket_name/path/in/bucket").

  • bool (recursive) – If True listing will contain all keys filtered by prefix, while with False only ones up to next / will be returned. To indicate missing keys, all that were listed will be combined under a common prefix and returned as BlobCommonPrefix.

  • int (limit) – Maximum number of BucketEntry objects returned.

coroutine glob_blobs(uri: URL) AsyncContextManager[AsyncIterator[BucketEntry]][source]

Glob search for blobs in the bucket:

async with client.buckets.glob_blobs(
    uri=URL("blob:my_bucket/folder1/**/*.txt")
) as blobs:
    async for blob in blobs:
        print(blob.key)

Similar to Storage.glob() the “**” pattern means “this directory and all sub-directories, recursively”.

Parameters:

uri (URL) – URL that specifies bucket and pattern to glob blobs, e.g. yarl.URL("blob:bucket_name/path/**/*.bin").

coroutine upload_file(src: URL, dst: URL, *, update: bool = False, progress: Optional[AbstractFileProgress] = None) None:[source]

Similarly to Storage.upload_file(), allows to upload local file src to bucket URL dst.

Parameters:
  • src (URL) – path to uploaded file on local disk, e.g. yarl.URL("file:///home/andrew/folder/file.txt").

  • dst (URL) – URL that specifies bucket and key to upload file e.g. yarl.URL("blob:bucket_name/folder/file.txt").

  • update (bool) – if true, upload only when the source file is newer than the destination file or when the destination file is missing.

  • progress (AbstractFileProgress) – a callback interface for reporting uploading progress, None for no progress report (default).

coroutine download_file(src: URL, dst: URL, *, update: bool = False, continue_: bool = False, progress: Optional[AbstractFileProgress] = None) None:[source]

Similarly to Storage.download_file(), allows to download remote file src to local path dst.

Parameters:
  • src (URL) – URL that specifies bucket and blob key to download e.g. yarl.URL("blob:bucket_name/folder/file.bin").

  • dst (URL) – local path to save downloaded file, e.g. yarl.URL("file:///home/andrew/folder/file.bin").

  • update (bool) – if true, download only when the source file is newer than the destination file or when the destination file is missing.

  • continue (bool) – if true, download only the part of the source file past the end of the destination file and append it to the destination file if the destination file is newer and not longer than the source file. Otherwise download and overwrite the whole file.

  • progress (AbstractFileProgress) – a callback interface for reporting downloading progress, None for no progress report (default).

coroutine upload_dir(src: URL, dst: URL, *, update: bool = False, filter: Optional[Callable[[str], Awaitable[bool]]] = None, ignore_file_names: AbstractSet[str] = frozenset(), progress: Optional[AbstractRecursiveFileProgress] = None) None:[source]

Similarly to Storage.upload_dir(), allows to recursively upload local directory src to Blob Storage URL dst.

Parameters:
  • src (URL) – path to uploaded directory on local disk, e.g. yarl.URL("file:///home/andrew/folder").

  • dst (URL) – path on Blob Storage for saving uploading directory e.g. yarl.URL("blob:bucket_name/folder/").

  • update (bool) – if true, download only when the source file is newer than the destination file or when the destination file is missing.

  • filter (Callable[[str], Awaitable[bool]]) – a callback function for determining which files and subdirectories be uploaded. It is called with a relative path of file or directory and if the result is false the file or directory will be skipped.

  • ignore_file_names (AbstractSet[str]) – a set of names of files which specify filters for skipping files and subdirectories. The format of ignore files is the same as .gitignore.

  • progress (AbstractRecursiveFileProgress) – a callback interface for reporting uploading progress, None for no progress report (default).

coroutine download_dir(src: URL, dst: URL, *, update: bool = False, continue_: bool = False, filter: Optional[Callable[[str], Awaitable[bool]]] = None, progress: Optional[AbstractRecursiveFileProgress] = None) None:[source]

Similarly to Storage.download_dir(), allows to recursively download remote directory src to local path dst.

Parameters:
  • src (URL) – path on Blob Storage to download a directory from e.g. yarl.URL("blob:bucket_name/folder/").

  • dst (URL) – local path to save downloaded directory, e.g. yarl.URL("file:///home/andrew/folder").

  • update (bool) – if true, download only when the source file is newer than the destination file or when the destination file is missing.

  • continue (bool) – if true, download only the part of the source file past the end of the destination file and append it to the destination file if the destination file is newer and not longer than the source file. Otherwise download and overwrite the whole file.

  • filter (Callable[[str], Awaitable[bool]]) – a callback function for determining which files and subdirectories be downloaded. It is called with a relative path of file or directory and if the result is false the file or directory will be skipped.

  • progress (AbstractRecursiveFileProgress) – a callback interface for reporting downloading progress, None for no progress report (default).

coroutine blob_is_dir(uri: URL) bool[source]

Check weather uri specifies a “folder” blob in a bucket.

Parameters:

src (URL) – URL that specifies bucket and blob key e.g. yarl.URL("blob:bucket_name/folder/sub_folder").

coroutine blob_rm(uri: URL, *, recursive: bool = False, progress: Optional[AbstractDeleteProgress] = None) None[source]

Remove blobs from bucket.

Parameters:
  • uri (URL) – URL that specifies bucket and blob key e.g. yarl.URL("blob:bucket_name/folder/sub_folder").

  • recursive (bool) – remove a directory recursively with all nested files and folders if True (False by default).

  • progress (AbstractDeleteProgress) – a callback interface for reporting delete progress, None for no progress report (default).

Raises:

IsADirectoryError if uri points on a directory and recursive flag is not set.

coroutine make_signed_url(uri: URL, expires_in_seconds: int = 3600) URL[source]

Generate a singed url that allows temporary access to blob.

Parameters:
  • uri (URL) – URL that specifies bucket and blob key e.g. yarl.URL("blob:bucket_name/folder/file.bin").

  • expires_in_seconds (int) – Duration in seconds generated url will be valid.

Returns:

Signed url (yarl.URL)

coroutine get_disk_usage(bucket_id_or_name: str, cluster_name: Optional[str] = None, bucket_owner: Optional[str) = None) AsyncContextManager[AsyncIterator[BucketUsage]][source]

Get disk space usage of a given bucket. Iterator yield partial results as calculation for the whole bucket can take time.

Parameters:
  • bucket_id_or_name (str) – bucket’s id or name.

  • cluster_name (str) – cluster to look for a bucket. Default is current cluster.

  • bucket_owner (str) – bucket owner’s username. Used only if looking up for bucket by it’s name. Default is current user.

async-for persistent_credentials_list(cluster_name: Optional[str] = None) AsyncContextManager[AsyncIterator[PersistentBucketCredentials]][source]

List user’s bucket persistent credentials, async iterator. Yields PersistentBucketCredentials instances.

Parameters:

cluster_name (str) – cluster to list persistent credentials. Default is current cluster.

coroutine persistent_credentials_create(bucket_ids: Iterable[str], name: Optional[str], read_only: Optional[bool] = False, cluster_name: Optional[str] = None) PersistentBucketCredentials[source]

Create a new persistent credentials for given set of buckets.

Parameters:
  • bucket_ids (Iterable[str]) – Iterable of bucket ids to create credentials for.

  • name (Optional[str]) – Name of the persistent credentials. Should be unique among all user’s bucket persistent credentials.

  • read_only (str) – Allow only read-only access using created credentials. False by default.

  • cluster_name (str) – cluster to create a persistent credentials. Default is current cluster.

Returns:

Newly created credentials info (PersistentBucketCredentials)

coroutine persistent_credentials_get(credential_id_or_name: str, cluster_name: Optional[str] = None) PersistentBucketCredentials[source]

Get a persistent credentials with id or name credential_id_or_name.

Parameters:
  • credential_id_or_name (str) – persistent credentials’s id or name.

  • cluster_name (str) – cluster to look for a persistent credentials. Default is current cluster.

Returns:

Credentials info (PersistentBucketCredentials)

coroutine persistent_credentials_rm(credential_id_or_name: str, cluster_name: Optional[str] = None) None[source]

Delete a persistent credentials with id or name credential_id_or_name.

Parameters:
  • credential_id_or_name (str) – persistent credentials’s id or name.

  • cluster_name (str) – cluster to look for a persistent credentials. Default is current cluster.

Bucket

class neuro_sdk.Bucket

Read-only dataclass for describing single bucket.

id

The bucket id, str.

owner

The bucket owner username, str.

name

The bucket name set by user, unique among all user’s buckets, str or None if no name was set.

uri

URI of the bucket resource, yarl.URL.

cluster_name

Cluster this bucket belongs to, str.

org_name

Org this bucket belongs to, str or None if there is no such org.

created_at

Bucket creation timestamp, datetime.

provider

Blob storage provider this bucket belongs to, Bucket.Provider.

BucketCredentials

class neuro_sdk.BucketCredentials

Read-only dataclass for describing credentials to single bucket.

bucket_id

The bucket id, str.

provider

Blob storage provider this bucket belongs to, Bucket.Provider.

credentials

Raw credentials to access a bucket inside the provider, Mapping[str, str]

Bucket.Provider

class Bucket.Provider[source]

Enumeration that describes bucket providers.

Can be one of the following values:

AWS

Amazon Web Services S3 bucket

MINIO

Minio S3 bucket

AZURE

Azure blob storage container

PersistentBucketCredentials

class neuro_sdk.PersistentBucketCredentials

Read-only dataclass for describing persistent credentials to some set of buckets created after user request.

id

The credentials id, str.

owner

The credentials owner username, str.

name

The credentials name set by user, unique among all user’s bucket credentials, str or None if no name was set.

read_only

The credentials provide read-only access to buckets, bool.

cluster_name

Cluster this credentials belongs to, str.

credentials

List of per bucket credentials, List[BucketCredentials]

BucketEntry

class neuro_sdk.BucketEntry

An abstract class dataclass for describing bucket contents entries.

key

Key of the blob, str.

bucket

Containing bucket, Bucket.

size

Size of the data in bytes, int.

created_at

Blob creation timestamp, datetime or None if underlying blob engine do not store such information

modified_at

Blob modification timestamp, datetime or None if underlying blob engine do not store such information

uri

URI identifying the blob, URL, e.g. blob://cluster_name/username/my_bucket/file.txt.

name

Name of blob, part of key after last /, str

is_dir(uri: URL) bool[source]

True if entry is directory blob object

is_file(uri: URL) bool[source]

True if entry is file blob object

BlobObject

class neuro_sdk.BlobObject

An ancestor of BucketEntry used for key that are present directly in underlying blob storage.

BlobCommonPrefix

class neuro_sdk.BlobCommonPrefix

An ancestor of BucketEntry for describing common prefixes for blobs in non-recursive listing. You can treat it as a kind of folder on Blob Storage.

BucketUsage

class neuro_sdk.BucketUsage

An dataclass for describing bucket disk space usage.

total_bytes

Total size of all objects in bytes, int.

object_count

Total number of objects, int.