AWS EC2: Virtual Servers That Let You Control the Machine Layer in the Cloud
EC2 hands you a virtual machine in the cloud. You choose the operating system, the hardware profile, the network placement, and the storage type. Nothing is hidden from you — you can SSH in, run any software, configure the kernel, and manage every process. That level of control is what separates EC2 from higher-level services like Lambda or App Runner.
Understanding EC2 well is worth the effort because its concepts — IAM roles, VPC placement, security groups, AMIs — appear across every other compute service on AWS. Fargate, Beanstalk, and Lambda all build on the same foundations.
What Happens When You Launch an Instance
AWS runs EC2 on top of its Nitro hypervisor, which offloads most virtualisation work to dedicated hardware chips. The result is near-bare-metal network and storage performance on current-generation instances.
When you launch, you specify six things:
- AMI — a snapshot of an OS and any pre-installed software; your starting point
- Instance type — the hardware profile (vCPUs, memory, network bandwidth, storage throughput)
- Key pair — RSA or ED25519 key for SSH access to Linux instances
- Security group — the stateful firewall that controls which traffic can reach the instance
- Subnet — which VPC segment and Availability Zone the instance joins
- IAM instance profile — the AWS permissions the code running on the instance can exercise
┌─────────────────────────────────────────────────────────────┐│ AWS Region us-east-1 ││ ││ ┌──────────── VPC 10.0.0.0/16 ──────────────────────────┐ ││ │ │ ││ │ Public Subnet 10.0.1.0/24 (AZ 1a) │ ││ │ ┌───────────────────┐ ┌────────────────────────┐ │ ││ │ │ EC2 (web tier) │ │ Elastic IP 54.x.x.x │ │ ││ │ │ t3.medium │──▶│ (mapped at IGW) │ │ ││ │ └────────┬──────────┘ └────────────────────────┘ │ ││ │ │ sg-web: allow 80,443 inbound │ ││ │ │ │ ││ │ Private Subnet 10.0.11.0/24 (AZ 1a) │ ││ │ ┌───────────────────┐ │ ││ │ │ EC2 (app tier) │ sg-app: allow 8080 from │ ││ │ │ m5.large │ sg-web only │ ││ │ └───────────────────┘ │ ││ └────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────┘Security Groups as the Primary Access Control
Security groups are stateful. An allowed inbound connection automatically permits its return traffic — you do not write a matching outbound rule. All inbound traffic is denied by default; all outbound is allowed by default.
The critical pattern for internal tiers: reference other security groups as the source, not IP ranges.
sg-alb: allow 443 from 0.0.0.0/0 |sg-app: allow 8080 from sg-alb only |sg-db: allow 5432 from sg-app onlyEven if someone launches an EC2 instance directly in the database subnet, it cannot connect to the database unless its security group is sg-app. IP-based rules would break the moment an instance changed its private IP.
Choosing an Instance Type
Instance type names follow a pattern: family + generation + processor + . + size.
m7i.2xlarge = general purpose (m), 7th generation, Intel (i), 2xlarge size.
| Family | When to use | Examples |
|---|---|---|
| t3, t4g | Dev environments, low-traffic sites — burstable CPU | t3.micro, t4g.small |
| m5, m6i, m7g | Application servers, APIs — balanced CPU and RAM | m6i.large, m7g.xlarge |
| c5, c6g, c7i | CPU-bound work — encoding, HPC, ML serving | c6g.large, c7i.xlarge |
| r5, r6i, r7g | Databases, in-memory caches — lots of RAM per vCPU | r6i.4xlarge |
| i3, i4i | High-IOPS local NVMe — Elasticsearch, Cassandra | i4i.2xlarge |
| p3, p4d, g5 | GPU workloads — ML training and inference | g4dn.xlarge |
Instances ending in g use AWS Graviton ARM processors and typically cost 20% less than equivalent Intel instances for the same throughput on most interpreted runtimes (Python, Node.js, Java, Go).
Placement Groups for Low-Latency Networking
When instances need to communicate with each other at the lowest possible latency, placement groups control their physical location:
CLUSTER: All instances → same rack Benefit: 10–25 Gbps between instances (vs 5 Gbps baseline) Risk: single rack failure takes all instances Use: HPC, distributed machine learning, financial tick data
SPREAD: Each instance → different rack Benefit: no shared single point of failure Limit: max 7 instances per AZ per group Use: small critical services that need maximum isolation
PARTITION: Instances divided into named groups, no shared hardware between groups Benefit: limits blast radius; you know which partition is affected Use: Kafka, Cassandra, Hadoop — distributed systems with replicationAMI: The Machine Blueprint
An AMI (Amazon Machine Image) is a snapshot that includes the OS, installed packages, and EBS volume configuration. You launch from an AMI; the resulting instance boots with exactly that state.
Common sources:
- AWS-managed: Amazon Linux 2023, Ubuntu, Windows Server — updated regularly, maintained by AWS or the OS vendor
- Marketplace: vendor-provided images with commercial software pre-licensed
- Your own: capture a running instance as an AMI to create a golden image for your team
# Launch an instance from the AWS CLIaws ec2 run-instances \ --image-id ami-0c02fb55956c7d316 \ --instance-type t3.medium \ --key-name my-keypair \ --security-group-ids sg-0abc123def456 \ --subnet-id subnet-0a1b2c3d \ --iam-instance-profile Name=AppServerProfile \ --user-data file://startup.sh \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-1}]'The --user-data script runs once at first boot — useful for pulling configuration, registering with monitoring, or installing application code.
Storage: EBS, Instance Store, and EFS
EBS (Elastic Block Store) is the persistent block device attached over the network. Choose between:
gp3— baseline 3,000 IOPS, up to 16,000 — good for most workloadsio2Block Express — up to 256,000 IOPS — for high-throughput databasesst1— throughput-optimised HDD for sequential reads (Kafka log storage)sc1— cold HDD, cheapest, for rarely accessed archives
Instance store volumes are NVMe drives physically attached to the host. They are faster than any EBS volume but data is gone when the instance stops or terminates. Use for scratch space or distributed systems where data is replicated across nodes.
EFS (Elastic File System) mounts simultaneously to multiple instances via NFS. Useful for shared configuration, content directories, or build artifacts that multiple instances need to read.
Real-World Pattern: Three-Tier Web Application
Internet users │ ▼Application Load Balancer (public subnets, AZ 1a + 1b) │ ▼EC2 Auto Scaling Group — web tier (private subnets) m6i.large × 2-8 instances sg-app: allow 8080 from sg-alb │ ▼RDS PostgreSQL Multi-AZ (isolated subnets) sg-db: allow 5432 from sg-appDuring a traffic spike, the ASG adds instances. After the spike, it removes them. You pay per second for what actually ran. The database tier does not scale horizontally — it uses Multi-AZ for failover, not for read scaling (that is what RDS read replicas or Aurora are for).
EC2 vs Serverless: When EC2 Still Wins
Lambda and Fargate have simplified many workloads, but EC2 remains the right choice when you need:
- Specific instance hardware (local NVMe, GPUs, custom AMI with kernel tweaks)
- Long-running processes that exceed Lambda’s 15-minute limit
- Consistent pricing under constant load where Reserved Instances beat per-invocation billing
- Software with per-socket or per-core licensing that requires Dedicated Hosts
Common Interview Questions
Q: What is the difference between stopping and terminating an EC2 instance? Stopping an EBS-backed instance shuts it down cleanly and preserves the root EBS volume — you can restart it. Termination deletes the root volume by default and removes the instance permanently. Instance store data is lost on both stop and terminate.
Q: Can a security group span multiple VPCs? No. Security groups are VPC-scoped. References to security groups in rules only apply within the same VPC.
Q: How do you give an EC2 instance permissions to call AWS APIs without hard-coded credentials? Attach an IAM instance profile containing an IAM role. The EC2 metadata service (169.254.169.254) vends temporary credentials that the AWS SDK picks up automatically. The credentials rotate every few hours without any action required.
Q: What is user data and when does it run? User data is a script or cloud-init configuration passed at launch that runs once when the instance first boots. Common uses: installing packages, pulling application code from S3, registering the instance with a configuration management system.