Azure Container Instances: Serverless Containers for Short-Lived Tasks and Burst Workloads
Not every containerised workload needs a Kubernetes cluster. Kubernetes adds power — scheduling, health checks, service meshes, rolling updates — but it also adds operational overhead, minimum cluster cost, and learning curve. When the workload is short-lived, stateless, or runs on an irregular schedule, Azure Container Instances (ACI) is a simpler answer.
ACI starts containers in seconds, bills per CPU-second and GB-second of memory consumed, and terminates cleanly when the work is done. There is no cluster to size, no node pool to manage, and no idle capacity to pay for during quiet periods.
Real-World Scenario
A data science team produces Jupyter notebooks that need to be rendered to HTML and published. The rendering process takes 2–5 minutes per notebook, runs on demand when a researcher commits to a repository, and requires a Python environment with specific library versions. Rather than keeping a VM running 24 hours waiting for commits, a GitHub Actions workflow triggers an ACI container on each push. The container runs the render, uploads the HTML to Blob Storage, and exits. Total billing: 3 minutes of CPU time per notebook.
How ACI Works
ACI Request Flow-----------------az container create (or ARM/API call) | Azure allocates CPU + memory on multi-tenant physical host | Container image pulled from registry | Container starts (typically < 30 seconds) | Work executes | Container exits (or runs until deleted) | Billing stops at container stop/deleteThe underlying host is managed by Microsoft — no node, no VM, no OS to patch. You interact only with the container abstraction.
Container Groups
A container group is ACI’s equivalent of a Kubernetes pod: one or more containers that share a network namespace, localhost communication, and a lifecycle. All containers in a group are scheduled on the same host and start together.
Container Group: "report-generator"+------------------------------------+| Container: renderer || Image: myacr.azurecr.io/render || CPU: 1.0 Memory: 2 GB || Port: 8080 || || Container: exporter || Image: myacr.azurecr.io/export || CPU: 0.5 Memory: 1 GB || || Shared volume: emptydir at /tmp |+------------------------------------+Public IP: 20.x.x.x:8080DNS label: reportgen.eastus.azurecontainer.ioSidecar patterns that work well with groups: logging agents that collect stdout from the main container, proxy containers that handle TLS termination, or init containers that seed a shared volume before the main process starts.
Restart Policies
ACI has three restart policies governing what happens when all containers in a group exit:
Restart Policy | Behaviour | Use Case-----------------|---------------------------|---------------------------Always | Restart on any exit | Long-running servicesOnFailure | Restart only on non-zero | Batch jobs that may failNever | Do not restart | One-shot tasksFor batch jobs that must complete exactly once, Never is the safest policy. Combine it with Azure Monitor alerts on the container’s termination code to detect failures.
Persistent Volumes
ACI containers are ephemeral by default. For jobs that need to read input data or write output, mount an Azure Files share or an emptyDir volume:
volumes: - name: inputdata azureFile: shareName: job-input storageAccountName: myaccount storageAccountKey: "<key or use secret>" - name: scratch emptyDir: {}emptyDir lives only for the container group lifetime. Azure Files mounts survive container restarts because the data lives in Azure Storage outside the container.
Deploying ACI via CLI
# Run a one-shot Python data processing containeraz container create \ --resource-group batch-rg \ --name nightly-etl \ --image myacr.azurecr.io/etl:v3 \ --registry-login-server myacr.azurecr.io \ --registry-username myacr \ --registry-password "$ACR_PASS" \ --cpu 2 \ --memory 4 \ --restart-policy Never \ --environment-variables \ SOURCE_CONTAINER=raw-data \ DEST_CONTAINER=processed \ --azure-file-volume-share-name etl-scratch \ --azure-file-volume-account-name mystorageacct \ --azure-file-volume-account-key "$STORAGE_KEY" \ --azure-file-volume-mount-path /mnt/scratch
# Wait for the container to finish and print logsaz container logs --resource-group batch-rg --name nightly-etl --followACI as AKS Virtual Nodes
AKS supports a virtual node add-on powered by ACI. When the Cluster Autoscaler would normally spin up a new node VM (which takes 3–5 minutes), virtual nodes schedule pods directly on ACI in seconds. The pods appear in kubectl get pods like normal pods and can use Services and ConfigMaps.
AKS Cluster with Virtual Nodes--------------------------------Regular Node Pool (Standard_D4s_v5) [Pod A] [Pod B] [Pod C] <- normal pods
Virtual Node (ACI-backed) [Pod D] [Pod E] <- burst pods on ACI No VM to provision; starts in ~10 seconds Billed per-second while runningVirtual nodes are suitable for stateless burst pods (web frontends, batch workers). Pods with persistent volume claims, hostPath mounts, or DaemonSets do not work on virtual nodes.
ACI vs. AKS: Choosing Between Them
Characteristic | ACI | AKS----------------------|-----------------------|--------------------------Setup time | Seconds | 5–10 minutes (cluster)Orchestration | None | Full KubernetesCost model | Per second | Per node (VM always on)Persistent workloads | Possible but unusual | Natural fitComplex scheduling | Not supported | Taints, affinities, etc.Networking | Basic / VNet inject | CNI, service mesh, etc.Best for | Batch, CI, burst | Long-running microservicesA common pattern: run AKS for steady-state services and use ACI (directly or via virtual nodes) for overnight batch jobs or CI pipelines.
Key Interview Points
- Billing granularity: ACI bills per CPU-second and GB-second. A 2-CPU, 4 GB container running for 300 seconds costs 600 CPU-seconds and 1200 GB-seconds.
- Container group vs. pod: Both are the scheduling unit with shared network and storage. Key difference: AKS pods have Kubernetes scheduling intelligence; ACI groups are simply allocated to the next available host.
- VNet injection: ACI can be deployed into a VNet subnet, giving containers private IPs and access to VNet resources without a public endpoint.
- Cold start vs. VM cold start: ACI typically starts in 10–30 seconds. An AKS node pool scale-out takes 3–5 minutes. For burst latency, ACI wins decisively.
- GPU support: ACI supports NVIDIA GPU-based container instances (K80, P100, V100) for short ML inference jobs without reserving GPU VMs permanently.
Best Practices
- Set CPU and memory limits appropriately — over-provisioning wastes money; under-provisioning causes OOM kills with no automatic increase.
- Pull images from Azure Container Registry with managed identity authentication rather than storing registry passwords in environment variables.
- Use
--restart-policy Neverfor batch jobs and alert on non-zero exit codes via Azure Monitor container logs. - For sensitive data, mount Azure Key Vault secrets as environment variables via the Key Vault references integration rather than passing them as plaintext.
- Clean up containers after completion with
az container deleteor set short-lived container groups to auto-delete using Logic Apps or Azure Functions to avoid stale resource accumulation.