Azure Container Instances: Serverless Containers for Short-Lived Tasks and Burst Workloads

Not every containerised workload needs a Kubernetes cluster. Kubernetes adds power — scheduling, health checks, service meshes, rolling updates — but it also adds operational overhead, minimum cluster cost, and learning curve. When the workload is short-lived, stateless, or runs on an irregular schedule, Azure Container Instances (ACI) is a simpler answer.

ACI starts containers in seconds, bills per CPU-second and GB-second of memory consumed, and terminates cleanly when the work is done. There is no cluster to size, no node pool to manage, and no idle capacity to pay for during quiet periods.

Real-World Scenario

A data science team produces Jupyter notebooks that need to be rendered to HTML and published. The rendering process takes 2–5 minutes per notebook, runs on demand when a researcher commits to a repository, and requires a Python environment with specific library versions. Rather than keeping a VM running 24 hours waiting for commits, a GitHub Actions workflow triggers an ACI container on each push. The container runs the render, uploads the HTML to Blob Storage, and exits. Total billing: 3 minutes of CPU time per notebook.

How ACI Works

ACI Request Flow
-----------------
az container create (or ARM/API call)
         |
  Azure allocates CPU + memory on
  multi-tenant physical host
         |
  Container image pulled from registry
         |
  Container starts (typically < 30 seconds)
         |
  Work executes
         |
  Container exits (or runs until deleted)
         |
  Billing stops at container stop/delete

The underlying host is managed by Microsoft — no node, no VM, no OS to patch. You interact only with the container abstraction.

Container Groups

A container group is ACI’s equivalent of a Kubernetes pod: one or more containers that share a network namespace, localhost communication, and a lifecycle. All containers in a group are scheduled on the same host and start together.

Container Group: "report-generator"
+------------------------------------+
|  Container: renderer               |
|    Image: myacr.azurecr.io/render  |
|    CPU: 1.0  Memory: 2 GB          |
|    Port: 8080                      |
|                                    |
|  Container: exporter               |
|    Image: myacr.azurecr.io/export  |
|    CPU: 0.5  Memory: 1 GB          |
|                                    |
|  Shared volume: emptydir at /tmp   |
+------------------------------------+
Public IP: 20.x.x.x:8080
DNS label: reportgen.eastus.azurecontainer.io

Sidecar patterns that work well with groups: logging agents that collect stdout from the main container, proxy containers that handle TLS termination, or init containers that seed a shared volume before the main process starts.

Restart Policies

ACI has three restart policies governing what happens when all containers in a group exit:

Restart Policy   | Behaviour                 | Use Case
-----------------|---------------------------|---------------------------
Always           | Restart on any exit       | Long-running services
OnFailure        | Restart only on non-zero  | Batch jobs that may fail
Never            | Do not restart            | One-shot tasks

For batch jobs that must complete exactly once, Never is the safest policy. Combine it with Azure Monitor alerts on the container’s termination code to detect failures.

Persistent Volumes

ACI containers are ephemeral by default. For jobs that need to read input data or write output, mount an Azure Files share or an emptyDir volume:

volumes:
  - name: inputdata
    azureFile:
      shareName: job-input
      storageAccountName: myaccount
      storageAccountKey: "<key or use secret>"
  - name: scratch
    emptyDir: {}

emptyDir lives only for the container group lifetime. Azure Files mounts survive container restarts because the data lives in Azure Storage outside the container.

Deploying ACI via CLI

# Run a one-shot Python data processing container
az container create \
  --resource-group batch-rg \
  --name nightly-etl \
  --image myacr.azurecr.io/etl:v3 \
  --registry-login-server myacr.azurecr.io \
  --registry-username myacr \
  --registry-password "$ACR_PASS" \
  --cpu 2 \
  --memory 4 \
  --restart-policy Never \
  --environment-variables \
      SOURCE_CONTAINER=raw-data \
      DEST_CONTAINER=processed \
  --azure-file-volume-share-name etl-scratch \
  --azure-file-volume-account-name mystorageacct \
  --azure-file-volume-account-key "$STORAGE_KEY" \
  --azure-file-volume-mount-path /mnt/scratch

# Wait for the container to finish and print logs
az container logs --resource-group batch-rg --name nightly-etl --follow

ACI as AKS Virtual Nodes

AKS supports a virtual node add-on powered by ACI. When the Cluster Autoscaler would normally spin up a new node VM (which takes 3–5 minutes), virtual nodes schedule pods directly on ACI in seconds. The pods appear in kubectl get pods like normal pods and can use Services and ConfigMaps.

AKS Cluster with Virtual Nodes
--------------------------------
Regular Node Pool (Standard_D4s_v5)
  [Pod A] [Pod B] [Pod C]   <- normal pods

Virtual Node (ACI-backed)
  [Pod D] [Pod E]           <- burst pods on ACI
  No VM to provision; starts in ~10 seconds
  Billed per-second while running

Virtual nodes are suitable for stateless burst pods (web frontends, batch workers). Pods with persistent volume claims, hostPath mounts, or DaemonSets do not work on virtual nodes.

ACI vs. AKS: Choosing Between Them

Characteristic        | ACI                   | AKS
----------------------|-----------------------|--------------------------
Setup time            | Seconds               | 5–10 minutes (cluster)
Orchestration         | None                  | Full Kubernetes
Cost model            | Per second            | Per node (VM always on)
Persistent workloads  | Possible but unusual  | Natural fit
Complex scheduling    | Not supported         | Taints, affinities, etc.
Networking            | Basic / VNet inject   | CNI, service mesh, etc.
Best for              | Batch, CI, burst      | Long-running microservices

A common pattern: run AKS for steady-state services and use ACI (directly or via virtual nodes) for overnight batch jobs or CI pipelines.

Key Interview Points

Billing granularity: ACI bills per CPU-second and GB-second. A 2-CPU, 4 GB container running for 300 seconds costs 600 CPU-seconds and 1200 GB-seconds.
Container group vs. pod: Both are the scheduling unit with shared network and storage. Key difference: AKS pods have Kubernetes scheduling intelligence; ACI groups are simply allocated to the next available host.
VNet injection: ACI can be deployed into a VNet subnet, giving containers private IPs and access to VNet resources without a public endpoint.
Cold start vs. VM cold start: ACI typically starts in 10–30 seconds. An AKS node pool scale-out takes 3–5 minutes. For burst latency, ACI wins decisively.
GPU support: ACI supports NVIDIA GPU-based container instances (K80, P100, V100) for short ML inference jobs without reserving GPU VMs permanently.

Best Practices

Set CPU and memory limits appropriately — over-provisioning wastes money; under-provisioning causes OOM kills with no automatic increase.
Pull images from Azure Container Registry with managed identity authentication rather than storing registry passwords in environment variables.
Use --restart-policy Never for batch jobs and alert on non-zero exit codes via Azure Monitor container logs.
For sensitive data, mount Azure Key Vault secrets as environment variables via the Key Vault references integration rather than passing them as plaintext.
Clean up containers after completion with az container delete or set short-lived container groups to auto-delete using Logic Apps or Azure Functions to avoid stale resource accumulation.