Step 3 โ Deployment Automation
Manual changes are how environments quietly drift apart until nobody trusts whatโs actually running in production. This step is about the tooling that keeps infrastructure reproducible and patched, and lets you operate on fleets of instances without SSH keys scattered across a dozen laptops.
CloudFormation: Stacks, Change Sets, and Drift
A CloudFormation stack is a unit of deployed infrastructure defined in a template. The exam cares less about template syntax and more about the operational lifecycle around a stack once it exists.
Change Sets โ Preview Before You Commit
Updating a stack directly is how you find out at 2 p.m. on a Friday that a property change requires replacing your production database. A change set shows you exactly what CloudFormation intends to do โ modify in place, or replace the resource โ before anything actually happens.
Current Stack Proposed Template Changeโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโRDS instance class: RDS instance class:db.t3.medium db.r5.large
โโโโโโโโโโโโโโโโโโโโโโโโ โ Change Set Preview โ โโโโโโโโโโโโโโโโโโโโโโโโค โ Action: Modify โ โ Replacement: False โ โ safe, in-place resize โโโโโโโโโโโโโโโโโโโโโโโโ
vs.
โโโโโโโโโโโโโโโโโโโโโโโโ โ Change Set Preview โ โโโโโโโโโโโโโโโโโโโโโโโโค โ Action: Modify โ โ Replacement: True โ โ new resource, old one deleted โโโโโโโโโโโโโโโโโโโโโโโโSome property changes force replacement (a new physical resource is created, the old one deleted) rather than an in-place update. Reviewing the change setโs Replacement field before executing is the difference between a routine deploy and an unplanned database rebuild.
Drift Detection โ Finding What Console Cowboys Broke
Drift happens the moment someone edits a stackโs resource directly in the console instead of through the template โ bumping a security group rule, changing an instance type, adjusting a tag. CloudFormation has no visibility into that change until you explicitly run drift detection, which compares the current live configuration against what the template declares.
Template declares: SecurityGroupIngress โ port 443 onlyLive resource has: SecurityGroupIngress โ port 443, port 22 (added manually)
Drift Detection Result: Resource: WebServerSG Status: MODIFIED Property: SecurityGroupIngress Expected: [{Port: 443}] Actual: [{Port: 443}, {Port: 22}]Drift status comes back as IN_SYNC, MODIFIED, DELETED, or NOT_CHECKED (for resource types that donโt support drift detection at all โ not every resource type does). Run drift detection on a schedule via EventBridge, not just reactively after an incident, so config drift surfaces before it causes one.
StackSets โ One Template, Many Accounts and Regions
StackSets deploy a single template across multiple accounts and regions from one operation โ the pattern for org-wide guardrails like a mandatory logging bucket, a baseline IAM role, or a security group standard that every account needs regardless of what team owns it. Updates propagate centrally rather than requiring per-account changes.
Systems Manager: The Operational Control Plane
Systems Manager (SSM) is less a single service and more a collection of operational tools unified under one console and one agent running on your instances.
| Capability | What it solves |
|---|---|
| Session Manager | Shell/RDP access without SSH keys, bastion hosts, or open inbound ports |
| Automation | Runbooks that execute multi-step operational workflows |
| Patch Manager | Scheduled, policy-driven OS and application patching |
| Run Command | Ad hoc command execution across a fleet, no login required |
| Parameter Store | Centralized config and secrets, referenced by other SSM features |
| State Manager | Continuously enforce a desired instance configuration |
Session Manager โ Access Without the Key Sprawl
Session Manager gives you a shell into an instance through the SSM agent and IAM, with no inbound port 22 or 3389 required, no bastion host to maintain, and no SSH key to lose track of. Every session is logged โ command history can stream to CloudWatch Logs or S3, which closes an audit gap that traditional SSH access usually leaves wide open.
Operator โโโบ IAM auth โโโบ SSM Service โโโบ SSM Agent (on instance) โ No open inbound ports required Session log โโโบ CloudWatch Logs / S3For instances in fully private subnets with no NAT path, Session Manager still works via VPC endpoints for the SSM service โ the agent talks to SSM over PrivateLink instead of the public internet, so โno internet routeโ is no longer a blocker for access.
Automation โ Runbooks Instead of Tribal Knowledge
An Automation document (runbook) codifies a multi-step operational task โ patch an AMI, remediate a non-compliant resource, rotate a credential, restart a service and verify health โ as a repeatable, auditable workflow instead of a wiki page someone wrote two years ago and never updated.
Automation Runbook: "AMI-Patch-And-Test" Step 1: Launch instance from base AMI Step 2: Run Patch Manager baseline Step 3: Run smoke test script via Run Command Step 4: If pass โ create new AMI Step 5: If fail โ terminate, notify via SNSRunbooks can trigger from a CloudWatch alarm, an EventBridge rule, or a Config compliance failure โ turning โsomeone noticed and manually fixed itโ into โthe platform fixed it and told someone afterward.โ
Patch Manager โ Patching Without the Spreadsheet
Patch Manager applies OS and application patches on a schedule, governed by patch baselines that define what gets auto-approved and after how long. A common pattern: security patches auto-approve after 7 days (giving vendors time to catch showstopper bugs), while other updates wait longer or require manual approval.
Maintenance windows control when patching runs, and patch groups (via tags) let you stagger rollout โ patch the dev fleet first, then a canary slice of production, then the rest โ instead of patching every instance simultaneously and finding out the hard way that a patch breaks something.
Blue/Green vs Rolling Deployments
Both patterns solve โhow do I ship a new version without an outage,โ but they trade off differently.
ROLLING BLUE/GREENโโโโโโโโ โโโโโโโโโโOld Old Old Old (start) Blue (100% traffic, old version)New Old Old Old (batch 1) Green (0% traffic, new version, fully deployed)New New Old Old (batch 2) โNew New New Old (batch 3) Traffic cutover (instant or weighted)New New New New (done) โ Blue kept warm briefly for instant rollbackRolling replaces capacity in batches โ cheaper, since youโre not running two full environments, but rollback means rolling backward through the same batches, which is slower under pressure. Blue/Green stands up an entirely new environment and cuts traffic over, often via Route 53 weighted routing or an ALB target group swap โ rollback is close to instant because the old environment is still sitting there warm, but youโre paying for double capacity during the transition window.
| Rolling | Blue/Green | |
|---|---|---|
| Infra cost during deploy | Lower | Higher (duplicate stack) |
| Rollback speed | Slower, batch-by-batch | Fast, traffic switch only |
| Risk exposure | Partial fleet on new version at any time | All-or-nothing cutover |
| Good fit for | Cost-sensitive, tolerant of gradual rollout | Risk-averse, needs instant rollback |
The exam scenario usually tells you which property matters more โ โmust roll back instantlyโ points to blue/green; โminimize cost during deploymentโ points to rolling.
Exam Focus: What Questions Test From This Step
- Reading a change set and identifying whether a property change causes replacement or an in-place update
- What drift detection actually compares, and the meaning of
MODIFIED,IN_SYNC,DELETED,NOT_CHECKED - StackSets as the tool for deploying one template across many accounts/regions
- Session Managerโs core value proposition: no open inbound ports, no bastion, full session logging
- How Session Manager reaches fully private instances via VPC endpoints
- Automation runbooks triggered by alarms, EventBridge, or Config remediation
- Patch Manager concepts: patch baselines, auto-approval delays, maintenance windows, patch groups for staggered rollout
- Choosing blue/green vs rolling based on rollback speed vs cost constraints in a scenario