EC2 Instance Families: Which Type Fits Your CPU, Memory, Storage, or GPU Workload
Choosing the wrong instance type costs money in two directions. A memory-optimised instance for a CPU-bound application wastes RAM you are paying for. A compute-optimised instance for a caching layer runs out of memory and starts swapping. AWS offers dozens of instance families specifically because different workloads have different bottlenecks.
The naming convention is consistent: family + generation + processor-suffix + . + size. For example, r7g.4xlarge is memory-optimised (r), seventh generation, Graviton ARM (g), 4xlarge.
┌──────────────────────────────────────────────────────────────────────┐│ EC2 Instance Family Map ││ ││ PRIMARY BOTTLENECK FAMILY CURRENT GEN BEST FOR ││ ────────────────────────────────────────────────────────────────── ││ Balanced (dev/test) T-series t3, t4g Burstable VMs ││ Balanced (prod) M-series m6i, m7g APIs, app servers ││ CPU-bound C-series c6i, c7g Encoding, HPC ││ Memory-bound R-series r6i, r7g DB buffer pools ││ Extreme memory X-series x2gd, x2idn SAP HANA, Redis ││ NVMe local storage I-series i3, i4i NoSQL, Elastic ││ Spinning HDD storage D-series d3 Hadoop HDFS ││ GPU training P-series p4d, p5 LLM training ││ GPU inference G-series g5, g4dn Model serving ││ AWS Inferentia Inf-series inf2 Low-cost infer. │└──────────────────────────────────────────────────────────────────────┘Burstable Instances: T-Series
T instances earn CPU credits when running below a baseline and spend them when bursting above it. A t3.small baseline is around 20% of one vCPU. An idle web server accumulates credits all night and spends them during the morning traffic peak.
CPU Credit mechanics: t3.micro baseline = 10% of 1 vCPU Credits earned per hour @ baseline: 6 credits Credits spent per hour @ 100% CPU: 60 credits Credit balance drains in 1 hour at 100% CPUStandard mode (default): when credits run out, the instance is throttled to baseline. The user notices degraded performance rather than a bill surprise.
Unlimited mode: when credits run out, the instance continues at full speed and AWS charges for excess CPU at $0.05 per vCPU-hour. Useful if you occasionally need full CPU but cannot predict when.
# Launch t3.medium with unlimited CPU modeaws ec2 run-instances \ --instance-type t3.medium \ --image-id ami-0c02fb55956c7d316 \ --credit-specification '{"CpuCredits":"unlimited"}'Good for: dev environments, internal tools, microservices with irregular traffic, small databases. Avoid for: consistently CPU-heavy workloads — you pay more for an M-series equivalent.
General Purpose: M-Series
M instances deliver predictable, consistent CPU with no bursting mechanics. The ratio is roughly 1 vCPU to 4 GB RAM, which suits most application server workloads.
Instance vCPU RAM Baseline networkm6i.large 2 8 GB Up to 12.5 Gbpsm6i.xlarge 4 16 GB Up to 12.5 Gbpsm6i.4xlarge 16 64 GB 12.5 Gbpsm6i.8xlarge 32 128 GB 25 GbpsThe m7g Graviton3 versions are ~20% cheaper than equivalent m6i Intel instances for most web and API workloads. If your runtime is Python, Java, Go, or Node.js, migration to Graviton is straightforward.
Use M-series for: application servers, CI/CD build agents, medium-sized relational databases, backend services.
Compute Optimised: C-Series
C instances provide more vCPUs per dollar by giving you a tighter memory ratio — roughly 1 vCPU to 2 GB RAM.
Comparison at xlarge: m5.xlarge: 4 vCPU, 16 GB RAM — $0.192/hr c5.xlarge: 4 vCPU, 8 GB RAM — $0.170/hr (same CPU, half RAM, 12% cheaper)For workloads where RAM is not the constraint, C instances deliver more compute per dollar. Use for:
- Video transcoding (ffmpeg is CPU-bound)
- Scientific simulations and numerical computing
- Ad serving and recommendation engines
- Game servers handling many concurrent connections
- ML inference when the model fits in available RAM
# C7g is the current Graviton3 compute-optimised instanceaws ec2 run-instances \ --instance-type c7g.2xlarge \ --image-id ami-arm64-amazon-linux-2023Memory Optimised: R, X, U-Series
Memory-optimised families provide large amounts of RAM relative to vCPUs. Use them when your working data set needs to fit in memory rather than spilling to disk.
R-series (8 GB per vCPU): r7g.large: 2 vCPU, 16 GB RAM r7g.4xlarge: 16 vCPU, 128 GB RAM r7g.16xlarge: 64 vCPU, 512 GB RAM
X-series (higher RAM density): x2gd.large: 4 vCPU, 64 GB RAM + 118 GB local NVMe x2idn.32xlarge: 128 vCPU, 2 TB RAM
U-series (ultra-high memory): u-6tb1.112xlarge: 448 vCPU, 6 TB RAMWhen you need memory-optimised instances:
- PostgreSQL or MySQL where the buffer pool should hold the full working set
- Redis or Memcached running on EC2 (though ElastiCache is usually preferable)
- Apache Spark jobs that load large data sets into memory for joins
- SAP HANA in-memory database (requires X or U-series)
- Feature engineering pipelines in ML where large matrices are computed in memory
Storage Optimised: I and D-Series
Storage-optimised instances have local NVMe SSDs (I-series) or high-capacity HDDs (D-series) physically attached to the host machine. The performance ceiling is dramatically higher than EBS for random I/O.
I-series local NVMe performance: i3.large: 475 GB NVMe ~100,000 random read IOPS i3.8xlarge: 6.25 TB NVMe ~1,600,000 random read IOPS i4i.8xlarge: 7.5 TB NVMe ~3,750,000 random read IOPS
D-series local HDD (dense storage): d3.8xlarge: 48 TB HDD ~900 MB/s sequential throughputThe critical tradeoff: local storage is not persistent. Data is gone when the instance stops, terminates, or the host hardware fails. Use storage-optimised instances for data that is replicated or can be rebuilt:
- Elasticsearch / OpenSearch clusters (index can be rebuilt from source data)
- Cassandra nodes (RF=3 protects against single-node failure)
- Kafka brokers (replicated partitions survive broker loss)
- Hadoop HDFS (replication factor 3)
- Temporary large-scale ETL staging areas
GPU and Accelerated Instances: P, G, Inf Series
P-series (training):
The p4d.24xlarge has 8× NVIDIA A100 GPUs connected via NVLink and 400 Gbps networking — the standard for large model training. The p5.48xlarge uses H100 GPUs and is the current generation for transformer model training.
G-series (inference and light training):
g4dn.xlarge has one NVIDIA T4 GPU and is the entry point for model inference, video transcoding, and light training. g5 instances use NVIDIA A10G GPUs for more demanding inference workloads.
Inf2 (AWS Inferentia):
AWS Inferentia2 chips are custom inference accelerators. inf2.xlarge (1 chip) up to inf2.48xlarge (12 chips). For inference workloads where the model supports it, Inferentia2 delivers better throughput-per-dollar than equivalent G-series instances.
# g4dn.xlarge for ML inferenceaws ec2 run-instances \ --instance-type g4dn.xlarge \ --image-id ami-deep-learning-ami-cuda
# inf2.xlarge for Inferentiaaws ec2 run-instances \ --instance-type inf2.xlarge \ --image-id ami-inf2-compatibleGraviton: ARM at Lower Cost
AWS builds its own ARM processors under the Graviton brand. Current generation is Graviton3 (g suffix in the instance name).
Most families have Graviton equivalents:
t4g— burstable general purposem7g— standard general purposec7g— compute optimisedr7g— memory optimisedx2gd— memory optimised with local NVMe
Graviton3 is typically 20% cheaper than equivalent Intel instances and uses ~60% less energy. For software compiled for any common runtime (Java on the JVM, Python, Go, Node.js, Ruby), Graviton migration requires no code changes. Native code compiled for x86 needs recompilation.
How Instance Sizes Scale
Resources double as you move up the size ladder within a family:
t3 family: t3.nano: 2 vCPU, 0.5 GB RAM t3.micro: 2 vCPU, 1 GB RAM t3.small: 2 vCPU, 2 GB RAM t3.medium: 2 vCPU, 4 GB RAM t3.large: 2 vCPU, 8 GB RAM t3.xlarge: 4 vCPU, 16 GB RAM t3.2xlarge: 8 vCPU, 32 GB RAMFor M, C, and R families, both vCPU count and RAM double with each size step. Network bandwidth increases at the larger sizes.
Decision Framework
- Profile first. Check CloudWatch CPU and memory metrics on existing instances. If CPU averages 15% but memory is at 80%, the RAM is the constraint — move to R-series.
- Try Graviton. For any new workload on Python, Java, or Node.js, start with a Graviton instance unless you have a specific reason not to.
- Use current generation. Older generations (m4, c4, r4) are still available but cost more per unit of performance than current (m7, c7, r7).
- Right-size after a week. Launch with your best guess, monitor actual utilisation for 7 days, and adjust. AWS Compute Optimizer automates this analysis.
Common Interview Questions
Q: When would you choose c5 over m5? When the workload is CPU-bound and RAM is not the limiting factor. C5 gives the same vCPU count as M5 at a tighter memory ratio, making it cheaper per compute unit for encoding, scientific computing, or stateless API work.
Q: What happens to data on an i4i instance when you terminate it? All local NVMe data is destroyed. Storage-optimised instances should only hold data that is replicated elsewhere or can be rebuilt — cache, distributed database replicas, or scratch data.
Q: What is the CPU credit balance on a t3.micro after 24 hours of idle? T3 instances earn 6 credits per vCPU per hour. A t3.micro (2 vCPU) earns 12 credits per hour = 288 credits in 24 hours, capped at the maximum balance (144 credits for t3.micro). After 12 hours idle the bucket is full.
Q: Why might Graviton be faster than Intel in some benchmarks? Graviton3 has a wider SIMD implementation, faster memory bandwidth, and improved cryptographic acceleration for some workloads. It is not universally faster — workloads with x86-specific assembly paths or SIMD tuning may perform better on Intel.