CosmicAC Logo

Job Configuration Reference

Fields you set when you create a GPU Container Job or a Managed Inference Job.

These are the fields you set when you create a job, in the web UI or with cosmicac jobs create. The job type determines which parameters apply. For the create flow, see Create a GPU Container Job and Create a Managed Inference Job.

Most fields appear in the web UI create form. Fields marked CLI only have no web UI control, and you set them from the CLI.

Common fields

These fields apply to every job type.

FieldRequiredDescription
Job name (Basics)YesJob name.
Tags (Basics)YesOne or more labels for the job. The CLI accepts a comma-separated list.
Location (Basics)YesLocation code for where the job runs, for example us or IN.
Cost limitNoMaximum cost for the job. CLI only.
AlertsNoAlert settings for the job. CLI only.

GPU configuration

These fields select the hardware for the job. In the web UI, they appear in the Hardware section.

FieldRequiredDescription
GPUYesGPU model, for example GH100_H100_SXM5_80GB.
GPU countYesNumber of GPUs. Must be a positive number.
CUDA / driverNoGPU driver version, for example CUDA 12.8.
CPU cores per GPUNoCPU cores allocated per GPU. CLI only.
Memory GB per GPUNoMemory in GB allocated per GPU. CLI only.

GPU Container parameters

These fields apply to a GPU Container Job. In the web UI, they appear in the Image & access section.

FieldRequiredDescription
Base OS imageYesBase OS image for the container, for example Ubuntu22.04/CUDA12.8.1.
Disk (GB)YesRoot disk size in GB. Must be a positive number.

Managed Inference parameters

These fields apply to a Managed Inference Job. In the web UI, they appear in the Model to serve, Serving configuration, and Endpoint sections.

FieldRequiredDescription
ModelYesHugging Face model ID to serve, for example Qwen/Qwen3-32B.
Runtime image (CUDA)YesServing runtime image, for example vLLM 0.8.5.
Data typeYesModel data type, for example BF16 or Auto.
QuantisationYesQuantization scheme, for example FP8 or INT8.
Tensor parallelYesTensor parallel size. Must be a positive number.
GPU memory utilisationYesFraction of GPU memory to use, between 0 and 1.
Max concurrent sequencesYesMaximum number of concurrent sequences. Must be a positive number.
Max model lengthYesMaximum model context length. Must be a positive number.
Reasoning parserYesReasoning parser to apply.
Video & image inputYesWhether the model accepts multimodal input. true or false.
Endpoint nameYesName of the inference endpoint.
ReplicasYesNumber of endpoint replicas. Must be a positive number.
Require Authorization headerYesWhether callers must send an authorization header. true or false.

Example configuration

A complete job configuration has the following structure for each job type.

GPU Container Job

{
  "name": "my-gpu-job",
  "tags": ["experiment"],
  "location": "us",
  "type": "GPU_CONTAINER",
  "gpu": {
    "type": "GH100_H100_SXM5_80GB",
    "count": 1
  },
  "params": {
    "base_image": "Ubuntu22.04/CUDA12.8.1",
    "root_disk_size_gb": 100
  }
}

Managed Inference Job

{
  "name": "my-inference-job",
  "tags": ["inference"],
  "location": "us",
  "type": "MANAGED_INFERENCE",
  "gpu": {
    "type": "GH100_H100_SXM5_80GB",
    "count": 1
  },
  "params": {
    "model": "Qwen/Qwen3-32B",
    "runtime_image": "vLLM:0.6.3 + CUDA:13.0",
    "data_type": "BF16",
    "quantisation": "INT8",
    "tensor_parallel": 1,
    "gpu_memory_utilization": 0.9,
    "max_concurrent_sequences": 512,
    "max_model_length": 8000,
    "reasoning_parser": "default",
    "multimodal": false,
    "endpoint_name": "my-endpoint",
    "replicas": 1,
    "require_auth_header": true
  }
}

On this page