Jobs API Reference

Jobs

class neuro_sdk.Jobs

Jobs subsystem, available as Client.jobs.

User can start new job, terminate it, get status, list running jobs etc.

async-with attach(id: str, *, tty: bool = False, stdin: bool = False, stdout: bool = False, stderr: bool = False, cluster_name: Optional[str] = None) AsyncContextManager[StdStream][source]

Get access to standard input, output, and error streams of a running job.

Parameters:
  • id (str) – job id to use for command execution.

  • tty (bool) – True if tty mode is requested, default is False.

  • stdin (bool) – True to attach stdin, default is False.

  • stdout (bool) – True to attach stdout, default is False.

  • stderr (bool) – True to attach stderr, default is False.

  • cluster_name (str) –

    cluster on which the job is running.

    None means the current cluster (default).

Returns:

Asynchronous context manager which can be used to access stdin/stdout/stderr, see StdStream for details.

async-with exec(id: str, cmd: str, *, tty: bool = False, stdin: bool = False, stdout: bool = False, stderr: bool = False, cluster_name: Optional[str] = None) AsyncContextManager[StdStream][source]

Start an exec session, get access to session’s standard input, output, and error streams.

Parameters:
  • id (str) – job id to use for command execution.

  • cmd (str) – the command to execute.

  • tty (bool) – True if tty mode is requested, default is False.

  • stdin (bool) – True to attach stdin, default is False.

  • stdout (bool) – True to attach stdout, default is False.

  • stderr (bool) – True to attach stderr, default is False.

  • cluster_name (str) –

    cluster on which the job is running.

    None means the current cluster (default).

Returns:

Asynchronous context manager which can be used to access stdin/stdout/stderr, see StdStream for details.

coroutine get_capacity(*, cluster_name: Optional[str] = None) Mapping[str, int][source]

Get counts of available job for specified cluster for each available preset.

The returned numbers reflect the remaining cluster capacity. In other words, it displays how many concurrent jobs for each preset can be started at the moment of the method call.

The returned capacity is an approximation, the real value can differ if already running jobs are finished or another user starts own jobs at the same time.

Parameters:

cluster_name (str) –

cluster for which the request is performed.

None means the current cluster (default).

Returns:

A mapping of preset_name to count, where count is a number of concurrent jobs that can be executed using preset_name.

coroutine kill(id: str) None[source]

Kill a job.

Parameters:

id (str) – job id to kill.

async-with async-for list(*, statuses: Iterable[JobStatus] = (), name: Optional[str] = None, tags: Sequence[str] = (), owners: Iterable[str] = (), since: Optional[datetime] = None, until: Optional[datetime] = None, reverse: bool = False, limit: Optional[int] = None, cluster_name: Optional[str] = None) AsyncContextManager[AsyncIterator[JobDescription]][source]

List user jobs, all scheduled, running and finished jobs by default.

Parameters:
  • statuses (Iterable[JobStatus]) –

    filter jobs by their statuses.

    The parameter can be a set or list of requested statuses, e.g. {JobStatus.PENDIND, JobStatus.RUNNING} can be used for requesting only scheduled and running job but skip finished and failed ones.

    Empty sequence means that jobs with all statuses are returned (default behavior). The list can be pretty huge though.

  • name (str) –

    Filter jobs by name (exact match).

    Empty string or None means that no filter is applied (default).

  • tags (Sequence[str]) –

    filter jobs by tags.

    Retrieves only jobs submitted with all tags from the specified list.

    Empty list means that no filter is applied (default).

  • owners (Iterable[str]) –

    filter jobs by their owners.

    The parameter can be a set or list of owner usernames (see JobDescription.owner for details).

    No owners filter is applied if the iterable is empty.

  • since (datetime) –

    filter jobs by their creation date.

    Retrieves only jobs submitted after the specified date (including) if it is not None. If the parameter is a naive datetime object, it represents local time.

    None means that no filter is applied (default).

  • until (datetime) –

    filter jobs by their creation date.

    Retrieves only jobs submitted before the specified date (including) if it is not None. If the parameter is a naive datetime object, it represents local time.

    None means that no filter is applied (default).

  • reverse (bool) –

    iterate jobs in the reverse order.

    If reverse is false (default) the jobs are iterated in the order of their creation date, from earlier to later. If reverse is true, they are iterated in the reverse order, from later to earlier.

  • limit (int) –

    limit the number of jobs.

    None means no limit (default).

  • cluster_name (str) –

    list jobs on specified cluster.

    None means the current cluster (default).

Returns:

asynchronous iterator which emits JobDescription objects.

monitor(id: str, *,                    cluster_name: Optional[str] = None,                    since: Optional[datetime] = None,
timestamps: bool = False,
separator: Optional[str] = None,
) -> AsyncContextManager[AsyncIterator[bytes]]

Get job logs as a sequence of data chunks, e.g.:

async with client.jobs.monitor(job_id) as it:
    async for chunk in it:
        print(chunk.encode('utf8', errors='replace')
Parameters:
  • id (str) – job id to retrieve logs.

  • cluster_name (str) –

    cluster on which the job is running.

    None means the current cluster (default).

  • since (datetime) –

    Retrieves only logs after the specified date (including) if it is not None. If the parameter is a naive datetime object, it represents local time.

    None means that no filter is applied (default).

  • timestamps (bool) – if true, include timestamps on each line in the log output.

  • separator (str) –

    string which will separate archive and live logs (if both parts are present).

    By default a string containing random characters are used. Empty separator suppresses output of separator.

Returns:

AsyncIterator over bytes log chunks.

async-with port_forward(id: str, local_port: int, job_port: int, *, no_key_check: bool = False, cluster_name: Optional[str] = None) None[source]

Forward local port to job, e.g.:

async with client.jobs.port_forward(job_id, 8080, 80):
    # port forwarding is awailable inside with-block
Parameters:
  • id (str) – job id.

  • local_port (int) – local TCP port to forward.

  • jot_port (int) – remote TCP port in a job to forward.

  • cluster_name (str) –

    cluster on which the job is running.

    None means the current cluster (default).

coroutine run(container: Container, *, name: Optional[str] = None, tags: Sequence[str] = (), description: Optional[str] = None, scheduler_enabled: bool = False, pass_config: bool = False, wait_for_jobs_quota: bool = False, schedule_timeout: Optional[float] = None, life_span: Optional[float] = None, priority: Optional[JobPriority] = None) JobDescription[source]

Start a new job.

Deprecated since version 20.11.25: Please use start() instead.

Parameters:
  • container (Container) – container description to start.

  • name (str) – optional container name.

  • name – optional job tags.

  • description (str) – optional container description.

  • scheduler_enabled (bool) – a flag that specifies is the job should participate in round-robin scheduling.

  • pass_config (bool) – a flag that specifies that platform should pass config data to job. This allows to API and CLI from the inside of the job. See Factory.login_with_passed_config() for details.

  • wait_for_jobs_quota (bool) – when this flag is set, job will wait for another job to stop instead of failing immediately because of total running jobs quota.

  • schedule_timeout (float) – minimal timeout to wait before reporting that job cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc). This option is not allowed when is_preemptible is set to True.

  • life_span (float) – job run-time limit in seconds. Pass None to disable.

  • priority (JobPriority) – priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority. Priority should be supported by cluster.

Returns:

JobDescription instance with information about started job.

coroutine start(*, image: RemoteImage, preset_name: str, cluster_name: Optional[str] = None, org_name: Optional[str] = None, entrypoint: Optional[str] = None, command: Optional[str] = None, working_dir: Optional[str] = None, http: Optional[HTTPPort] = None, env: Optional[Mapping[str, str]] = None, volumes: Sequence[Volume] = (), secret_env: Optional[Mapping[str, URL]] = None, secret_files: Sequence[SecretFile] = (), disk_volumes: Sequence[DiskVolume] = (), tty: bool = False, shm: bool = False, name: Optional[str] = None, tags: Sequence[str] = (), description: Optional[str] = None, pass_config: bool = False, wait_for_jobs_quota: bool = False, schedule_timeout: Optional[float] = None, restart_policy: JobRestartPolicy = JobRestartPolicy.NEVER, life_span: Optional[float] = None, privileged: bool = False, priority: Optional[JobPriority] = None) JobDescription[source]

Start a new job.

Parameters:
  • image (RemoteImage) – image used for starting a container.

  • preset_name (str) – name of the preset of resources given to a container on a node.

  • cluster_name (str) – cluster to start a job. Default is current cluster.

  • org_name (str) – org to start a job on behalf of. Default is current org.

  • entrypoint (str) – optional Docker ENTRYPOINT used for overriding image entry-point (str), default None is used to pick entry-point from image’s Dockerfile.

  • command (str) – optional command line to execute inside a container (str), None for picking command line from image’s Dockerfile.

  • working_dir (str) – optional working directory inside a container (str), None for picking working directory from image’s Dockerfile.

  • http (HTTPPort) – optional parameters of HTTP server exposed by container, None if the container doesn’t provide HTTP access.

  • env (Mapping[str,str]) – optional custom environment variables for pushing into container’s task. A Mapping where keys are environments variables names and values are variable values, both str. None by default.

  • volumes (Sequence[Volume]) – optional Docker volumes to mount into container, a Sequence of Volume objects. Empty tuple by default.

  • secret_env (Mapping[str,yarl.URL]) – optional secrets pushed as custom environment variables into container’s task. A Mapping where keys are environments variables names (str) and values are secret URIs (yarl.URL). None by default.

  • secret_files (Sequence[SecretFile]) – optional secrets mounted as files in a container, a Sequence of SecretFile objects. Empty tuple by default.

  • disk_volumes (Sequence[DiskVolume]) – optional disk volumes used to mount into container, a Sequence of DiskVolume objects. Empty tuple by default.

  • tty (bool) – Allocate a TTY or not. False by default.

  • shm (bool) – Use Linux shared memory or not. False by default.

  • name (str) – optional job name.

  • tags (Sequence[str]) – optional job tags.

  • description (str) – optional container description.

  • pass_config (bool) – a flag that specifies that platform should pass config data to job. This allows to API and CLI from the inside of the job. See Factory.login_with_passed_config() for details.

  • wait_for_jobs_quota (bool) – when this flag is set, job will wait for another job to stop instead of failing immediately because of total running jobs quota.

  • schedule_timeout (float) – minimal timeout to wait before reporting that job cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc).

  • life_span (float) – job run-time limit in seconds. Pass None to disable.

  • restart_policy (JobRestartPolicy) – job restart behavior. JobRestartPolicy.NEVER by default.

  • privileged (bool) – Run job in privileged mode. This mode should be supported by cluster.

  • priority (JobPriority) – priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority. Priority should be supported by cluster. None by default.

Returns:

JobDescription instance with information about started job.

coroutine send_signal(id: str, *, cluster_name: Optional[str] = None) None[source]

Send SIGKILL signal to a job.

Parameters:
  • id (str) – job id.

  • cluster_name (str) –

    cluster on which the job is running.

    None means the current cluster (default).

coroutine status(id: str) JobDescription[source]

Get information about a job.

Parameters:

id (str) – job id to get its status.

Returns:

JobDescription instance with job status details.

async-with async-for top(id: str, *, cluster_name: Optional[str] = None) AsyncContextManager[AsyncIterator[JobTelemetry]][source]

Get job usage statistics, e.g.:

async with client.jobs.top(job_id) as top:
    async for data in top:
        print(data.cpu, data.memory)
Parameters:
  • id (str) – job id to get telemetry data.

  • cluster_name (str) –

    cluster on which the job is running.

    None means the current cluster (default).

Returns:

asynchronous iterator which emits JobTelemetry objects periodically.

coroutine bump_life_span(id: str, additional_life_span: float) None[source]

Increase life span of a job.

Parameters:
  • id (str) – job id to increase life span.

  • life_span (float) – amount of seconds to add to job run-time limit.

Container

class neuro_sdk.Container

Read-only dataclass for describing Docker image and environment to run a job.

image

RemoteImage used for starting a container.

resources

Resources which are used to schedule a container.

entrypoint

Docker ENTRYPOINT used for overriding image entry-point (str), default None is used to pick entry-point from image’s Dockerfile.

command

Command line to execute inside a container (str), None for picking command line from image’s Dockerfile.

http

HTTPPort for describing parameters of HTTP server exposed by container, None if the container doesn’t provide HTTP access.

env

Custom environment variables for pushing into container’s task.

A Mapping where keys are environments variables names and values are variable values, both str. Empty dict by default.

volumes

Docker volumes to mount into container, a Sequence of Volume objects. Empty list by default.

secret_env

Secrets pushed as custom environment variables into container’s task.

A Mapping where keys are environments variables names (str) and values are secret URIs (yarl.URL). Empty dict by default.

secret_files

Secrets mounted as files in a container, a Sequence of SecretFile objects. Empty list by default.

disk_volumes

Disk volumes used to mount into container, a Sequence of DiskVolume objects. Empty list by default.

HTTPPort

class neuro_sdk.HTTPPort

Read-only dataclass for exposing HTTP server started in a job.

To access this server from remote machine please use JobDescription.http_url.

port

Open port number in container’s port namespace, int.

requires_auth

Authentication in Neuro Platform is required for access to exposed HTTP server if True, the port is open publicly otherwise.

JobDescription

class neuro_sdk.JobDescription

Read-only dataclass for describing a job.

id

Job ID, str.

owner

A name of user who created a job, str.

cluster_name

A name of cluster where job was scheduled, str.

New in version 19.9.11.

status

Current status of job, JobStatus enumeration.

history

Additional information about job, e.g. creation time and process exit code. JobStatusHistory instance.

container

Description of container information used to start a job, Container instance.

scheduler_enabled

Is job participate in round-robin scheduling.

preemptible_node

Is this node allows execution on preemptible node. If set to True, the job only allows execution on preemptible nodes. If set to False, the job only allows execution on non-preemptible nodes.

pass_config

Is config data is passed by platform, see Factory.login_with_passed_config() for details.

privileged

Is the job is running in privileged mode, refer to docker documentation for details.

name

Job name provided by user at creation time, str or None if name is omitted.

tags

List of job tags provided by user at creation time, Sequence[str] or () if tags omitted.

description

Job description text provided by user at creation time, str or None if description is omitted.

http_url

yarl.URL for HTTP server exposed by job, empty URL if the job doesn’t expose HTTP server.

ssh_server

yarl.URL to access running job by SSH. Internal field, don’t access it from custom code. Use Jobs.exec() and Jobs.port_forward() as official API for accessing to running job.

internal_hostname

DNS name to access the running job from other jobs.

internal_hostname_named

DNS name to access the running job from other jobs based on jobs name instead of jobs id. Produces same value for jobs with name and owner in same cluster.

life_span

Job run-time limit in seconds, float

schedule_timeout

Minimal timeout in seconds job will wait before reporting it cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc), float

priority

Priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority, JobPriority

_internal

Some internal info about job used by platform. Should not be used.

JobRestartPolicy

class neuro_sdk.JobRestartPolicy

Enumeration that describes job restart behavior.

Can be one of the following statues:

NEVER

Job will never be restarted.

ON_FAILURE

Job will be restarted only in case of job failure.

ALWAYS

Job will always be restarted after success or failure.

JobPriority

class neuro_sdk.JobPriority

Enumeration that describes job priority.

Can be one of the following statues:

LOW

Jobs with LOW priority will start after all other jobs.

NORMAL

Default job priority.

HIGH

Jobs with HIGH priority will start before all other jobs.

JobStatus

class neuro_sdk.JobStatus

Enumeration that describes job state.

Can be one of the following statues:

PENDING

Job is scheduled for execution but not started yet.

RUNNING

Job is running now.

SUSPENDED

Scheduled job is paused to allow other jobs to run.

SUCCEEDED

Job is finished successfully.

CANCELLED

Job was canceled while it was running.

FAILED

Job execution is failed.

UNKNOWN

Invalid (or unknown) status code, should be never returned from server.

Also some shortcuts are available:

items() Set[JobStatus][source]

Returns all statuses except UNKNOWN.

active_items() Set[JobStatus][source]

Returns all statuses that are not final: PENDING, SUSPENDED and RUNNING.

finished_items() Set[JobStatus][source]

Returns all statuses that are final: SUCCEEDED, CANCELLED and FAILED.

Each enum value has next bool fields:

is_pending

Job is waiting to become running. True for PENDING and SUSPENDED states.

is_running

Job is running now. True for RUNNING state.

is_finished

Job completed execution. True for SUCCEEDED, CANCELLED and FAILED

JobStatusItem

class neuro_sdk.JobStatusItem

Read-only dataclass for describing job status transition details.

transition_time

Status transition timestamp, datetime.

status

Status of job after this transition, JobStatus enumeration.

reason

Additional information for job status, str.

Examples of reason values:

description

Extended description for short abbreviation described by reason, empty str if no additional information is provided.

exit_code

Exit code for container’s process (int) or None if the job was not started or was still running when this transition occurred.

JobStatusHistory

class neuro_sdk.JobStatusHistory

Read-only dataclass for describing job status details, e.g. creation and finishing time, exit code etc.

status

Current status of job, JobStatus enumeration.

The same as JobDescription.status.

reason

Additional information for current status, str.

Examples of reason values:

description

Extended description for short abbreviation described by reason, empty str if no additional information is provided.

exit_code

Exit code for container’s process (int) or None if the job was not started or is still running.

restarts

Number of container’s restarts, int.

created_at

Job creation timestamp, datetime or None.

started_at

Job starting timestamp, datetime or None if job not started.

finished_at

Job finishing timestamp, datetime or None if job not finished.

transitions

List of job status transitions, Sequence of JobStatusItem.

JobTelemetry

class neuro_sdk.JobTelemetry

Read-only dataclass for job telemetry (statistics), e.g. consumed CPU load, memory footprint etc.

See also

Jobs.top().

timestamp

Date and time of telemetry report (float), time in seconds since the epoch, like the value returned from time.time().

See time and datetime for more information how to handle the timestamp.

cpu

CPU load, float. 1 means fully loaded one CPU unit, 0.5 is for half-utilized CPU.

memory

Consumed memory in megabytes, float.

gpu_duty_cycle

Percentage of time over the past sample period (10 seconds) during which the accelerator was actively processing. int between 1 and 100, None if the job has no GPU available.

gpu_memory

Percentage of used GPU memory, float between 0 and 1.

Message

class neuro_sdk.Message

Read-only dataclass for representing job’s stdout/stderr stream chunks, returned from StdStream.read_out().

fileno

Stream number, 1 for stdin and 2 for stdout.

data

A chunk of stdout/stderr data, bytes.

Resources

class neuro_sdk.Resources

Read-only dataclass for describing resources (memory, CPU/GPU etc.) available for container, see also Container.resources attribute.

memory_mb

Requested memory amount in MegaBytes, int.

cpu

Requested number of CPUs, float. Please note, Docker supports fractions here, e.g. 0.5 CPU means a half or CPU on the target node.

gpu

The number of requested GPUs, int. Use None for jobs that doesn’t require GPU.

gpu_model

The name of requested GPU model, str (or None for job without GPUs).

shm

Use Linux shared memory or not, bool. Provide True if you don’t know what /dev/shm device means.

tpu_type

Requested TPU type, see also https://en.wikipedia.org/wiki/Tensor_processing_unit

tpu_software_version

Requested TPU software version.

StdStream

class neuro_sdk.StdStream

A class for communicating with attached job (Jobs.attach()) or exec session (Jobs.exec()). Use read_out() for reading from stdout/stderr and write_in() for writing into stdin.

coroutine close() None[source]

Close StdStream instance.

coroutine read_out() Optional[Message][source]

Read next chunk from stdout/stderr.

Returns:

Message instance for read data chunk or None if EOF is reached or StdStream was closed.

coroutine write_in(data: bytes) None[source]

Write data to stdin.

Parameters:

data (bytes) – data to send.

Volume

class neuro_sdk.Volume

Read-only dataclass for describing mounted volumes of a container.

storage_uri

An URL on remotes storage, yarl.URL.

container_path

A path on container filesystem, str.

read_only

True is the volume is mounted in read-only mode, False for read-write (default).

SecretFile

class neuro_sdk.SecretFile

Read-only dataclass for describing secrets mounted as files in a container.

secret_uri

An URI on a secret, yarl.URL.

container_path

A path on container filesystem, str.

DiskVolume

class neuro_sdk.DiskVolume

Read-only dataclass for describing mounted disk volumes of a container.

disk_uri

An URI on a disk, yarl.URL.

container_path

A path on container filesystem, str.

read_only

True is the volume is mounted in read-only mode, False for read-write (default).