EC2 Instance Families: Which Type Fits Your CPU, Memory, Storage, or GPU Workload

Choosing the wrong instance type costs money in two directions. A memory-optimised instance for a CPU-bound application wastes RAM you are paying for. A compute-optimised instance for a caching layer runs out of memory and starts swapping. AWS offers dozens of instance families specifically because different workloads have different bottlenecks.

The naming convention is consistent: family + generation + processor-suffix + . + size. For example, r7g.4xlarge is memory-optimised (r), seventh generation, Graviton ARM (g), 4xlarge.

┌──────────────────────────────────────────────────────────────────────┐
│                    EC2 Instance Family Map                           │
│                                                                      │
│  PRIMARY BOTTLENECK    FAMILY     CURRENT GEN     BEST FOR          │
│  ──────────────────────────────────────────────────────────────────  │
│  Balanced (dev/test)   T-series   t3, t4g         Burstable VMs      │
│  Balanced (prod)       M-series   m6i, m7g        APIs, app servers  │
│  CPU-bound             C-series   c6i, c7g        Encoding, HPC      │
│  Memory-bound          R-series   r6i, r7g        DB buffer pools    │
│  Extreme memory        X-series   x2gd, x2idn     SAP HANA, Redis   │
│  NVMe local storage    I-series   i3, i4i         NoSQL, Elastic     │
│  Spinning HDD storage  D-series   d3               Hadoop HDFS       │
│  GPU training          P-series   p4d, p5          LLM training      │
│  GPU inference         G-series   g5, g4dn         Model serving     │
│  AWS Inferentia        Inf-series  inf2             Low-cost infer.   │
└──────────────────────────────────────────────────────────────────────┘

Burstable Instances: T-Series

T instances earn CPU credits when running below a baseline and spend them when bursting above it. A t3.small baseline is around 20% of one vCPU. An idle web server accumulates credits all night and spends them during the morning traffic peak.

CPU Credit mechanics:
  t3.micro baseline = 10% of 1 vCPU
  Credits earned per hour @ baseline: 6 credits
  Credits spent per hour @ 100% CPU: 60 credits
  Credit balance drains in 1 hour at 100% CPU

Standard mode (default): when credits run out, the instance is throttled to baseline. The user notices degraded performance rather than a bill surprise.

Unlimited mode: when credits run out, the instance continues at full speed and AWS charges for excess CPU at $0.05 per vCPU-hour. Useful if you occasionally need full CPU but cannot predict when.

# Launch t3.medium with unlimited CPU mode
aws ec2 run-instances \
  --instance-type t3.medium \
  --image-id ami-0c02fb55956c7d316 \
  --credit-specification '{"CpuCredits":"unlimited"}'

Good for: dev environments, internal tools, microservices with irregular traffic, small databases. Avoid for: consistently CPU-heavy workloads — you pay more for an M-series equivalent.

General Purpose: M-Series

M instances deliver predictable, consistent CPU with no bursting mechanics. The ratio is roughly 1 vCPU to 4 GB RAM, which suits most application server workloads.

Instance     vCPU    RAM      Baseline network
m6i.large    2       8 GB     Up to 12.5 Gbps
m6i.xlarge   4       16 GB    Up to 12.5 Gbps
m6i.4xlarge  16      64 GB    12.5 Gbps
m6i.8xlarge  32      128 GB   25 Gbps

The m7g Graviton3 versions are ~20% cheaper than equivalent m6i Intel instances for most web and API workloads. If your runtime is Python, Java, Go, or Node.js, migration to Graviton is straightforward.

Use M-series for: application servers, CI/CD build agents, medium-sized relational databases, backend services.

Compute Optimised: C-Series

C instances provide more vCPUs per dollar by giving you a tighter memory ratio — roughly 1 vCPU to 2 GB RAM.

Comparison at xlarge:
  m5.xlarge:  4 vCPU,  16 GB RAM  — $0.192/hr
  c5.xlarge:  4 vCPU,   8 GB RAM  — $0.170/hr
  (same CPU, half RAM, 12% cheaper)

For workloads where RAM is not the constraint, C instances deliver more compute per dollar. Use for:

Video transcoding (ffmpeg is CPU-bound)
Scientific simulations and numerical computing
Ad serving and recommendation engines
Game servers handling many concurrent connections
ML inference when the model fits in available RAM

# C7g is the current Graviton3 compute-optimised instance
aws ec2 run-instances \
  --instance-type c7g.2xlarge \
  --image-id ami-arm64-amazon-linux-2023

Memory Optimised: R, X, U-Series

Memory-optimised families provide large amounts of RAM relative to vCPUs. Use them when your working data set needs to fit in memory rather than spilling to disk.

R-series (8 GB per vCPU):
  r7g.large:   2 vCPU,   16 GB RAM
  r7g.4xlarge: 16 vCPU,  128 GB RAM
  r7g.16xlarge: 64 vCPU,  512 GB RAM

X-series (higher RAM density):
  x2gd.large:  4 vCPU,   64 GB RAM + 118 GB local NVMe
  x2idn.32xlarge: 128 vCPU, 2 TB RAM

U-series (ultra-high memory):
  u-6tb1.112xlarge: 448 vCPU, 6 TB RAM

When you need memory-optimised instances:

PostgreSQL or MySQL where the buffer pool should hold the full working set
Redis or Memcached running on EC2 (though ElastiCache is usually preferable)
Apache Spark jobs that load large data sets into memory for joins
SAP HANA in-memory database (requires X or U-series)
Feature engineering pipelines in ML where large matrices are computed in memory

Storage Optimised: I and D-Series

Storage-optimised instances have local NVMe SSDs (I-series) or high-capacity HDDs (D-series) physically attached to the host machine. The performance ceiling is dramatically higher than EBS for random I/O.

I-series local NVMe performance:
  i3.large:     475 GB NVMe    ~100,000 random read IOPS
  i3.8xlarge:  6.25 TB NVMe  ~1,600,000 random read IOPS
  i4i.8xlarge:  7.5 TB NVMe  ~3,750,000 random read IOPS

D-series local HDD (dense storage):
  d3.8xlarge: 48 TB HDD        ~900 MB/s sequential throughput

The critical tradeoff: local storage is not persistent. Data is gone when the instance stops, terminates, or the host hardware fails. Use storage-optimised instances for data that is replicated or can be rebuilt:

Elasticsearch / OpenSearch clusters (index can be rebuilt from source data)
Cassandra nodes (RF=3 protects against single-node failure)
Kafka brokers (replicated partitions survive broker loss)
Hadoop HDFS (replication factor 3)
Temporary large-scale ETL staging areas

GPU and Accelerated Instances: P, G, Inf Series

P-series (training): The p4d.24xlarge has 8× NVIDIA A100 GPUs connected via NVLink and 400 Gbps networking — the standard for large model training. The p5.48xlarge uses H100 GPUs and is the current generation for transformer model training.

G-series (inference and light training): g4dn.xlarge has one NVIDIA T4 GPU and is the entry point for model inference, video transcoding, and light training. g5 instances use NVIDIA A10G GPUs for more demanding inference workloads.

Inf2 (AWS Inferentia): AWS Inferentia2 chips are custom inference accelerators. inf2.xlarge (1 chip) up to inf2.48xlarge (12 chips). For inference workloads where the model supports it, Inferentia2 delivers better throughput-per-dollar than equivalent G-series instances.

# g4dn.xlarge for ML inference
aws ec2 run-instances \
  --instance-type g4dn.xlarge \
  --image-id ami-deep-learning-ami-cuda

# inf2.xlarge for Inferentia
aws ec2 run-instances \
  --instance-type inf2.xlarge \
  --image-id ami-inf2-compatible

Graviton: ARM at Lower Cost

AWS builds its own ARM processors under the Graviton brand. Current generation is Graviton3 (g suffix in the instance name).

Most families have Graviton equivalents:

t4g — burstable general purpose
m7g — standard general purpose
c7g — compute optimised
r7g — memory optimised
x2gd — memory optimised with local NVMe

Graviton3 is typically 20% cheaper than equivalent Intel instances and uses ~60% less energy. For software compiled for any common runtime (Java on the JVM, Python, Go, Node.js, Ruby), Graviton migration requires no code changes. Native code compiled for x86 needs recompilation.

How Instance Sizes Scale

Resources double as you move up the size ladder within a family:

t3 family:
  t3.nano:    2 vCPU,  0.5 GB RAM
  t3.micro:   2 vCPU,  1 GB RAM
  t3.small:   2 vCPU,  2 GB RAM
  t3.medium:  2 vCPU,  4 GB RAM
  t3.large:   2 vCPU,  8 GB RAM
  t3.xlarge:  4 vCPU,  16 GB RAM
  t3.2xlarge: 8 vCPU,  32 GB RAM

For M, C, and R families, both vCPU count and RAM double with each size step. Network bandwidth increases at the larger sizes.

Decision Framework

Profile first. Check CloudWatch CPU and memory metrics on existing instances. If CPU averages 15% but memory is at 80%, the RAM is the constraint — move to R-series.
Try Graviton. For any new workload on Python, Java, or Node.js, start with a Graviton instance unless you have a specific reason not to.
Use current generation. Older generations (m4, c4, r4) are still available but cost more per unit of performance than current (m7, c7, r7).
Right-size after a week. Launch with your best guess, monitor actual utilisation for 7 days, and adjust. AWS Compute Optimizer automates this analysis.

Common Interview Questions

Q: When would you choose c5 over m5? When the workload is CPU-bound and RAM is not the limiting factor. C5 gives the same vCPU count as M5 at a tighter memory ratio, making it cheaper per compute unit for encoding, scientific computing, or stateless API work.

Q: What happens to data on an i4i instance when you terminate it? All local NVMe data is destroyed. Storage-optimised instances should only hold data that is replicated elsewhere or can be rebuilt — cache, distributed database replicas, or scratch data.

Q: What is the CPU credit balance on a t3.micro after 24 hours of idle? T3 instances earn 6 credits per vCPU per hour. A t3.micro (2 vCPU) earns 12 credits per hour = 288 credits in 24 hours, capped at the maximum balance (144 credits for t3.micro). After 12 hours idle the bucket is full.

Q: Why might Graviton be faster than Intel in some benchmarks? Graviton3 has a wider SIMD implementation, faster memory bandwidth, and improved cryptographic acceleration for some workloads. It is not universally faster — workloads with x86-specific assembly paths or SIMD tuning may perform better on Intel.