Cloud/ AWS / AWS Certified DevOps Engineer โ€” Professional (DOP-C02) / DOP-C02 Step 2: CloudFormation StackSets, CDK & Systems Manager at Scale

AWS Amazon Web Services Professional Step 2 of 5 106 guides ยท updated 2026

Hands-on guides to compute, storage, databases, networking, and serverless on the world's most widely adopted cloud platform.

Step 2 โ€” Configuration Management & Infrastructure as Code

Ask ten engineers what โ€œinfrastructure as codeโ€ means and youโ€™ll get ten answers that all stop short of what this exam wants. A single template that provisions a VPC is not the skill being tested here. The skill being tested is: can you manage infrastructure as code across dozens of accounts, keep it from drifting, and roll changes out without someone fat-fingering a stack update in production. Letโ€™s get into it.


CloudFormation Beyond the Basics

You already know the fundamentals โ€” templates, resources, parameters, outputs. At the professional level, the exam cares about how templates compose and fail at scale.

Nested Stacks

A nested stack is a stack created as a resource inside a parent stack (AWS::CloudFormation::Stack), pointing at a child template stored in S3. The reason to use them isnโ€™t aesthetics โ€” itโ€™s the 500-resource-per-stack limit and the desire to reuse common building blocks (a standard VPC, a standard logging setup) across many top-level stacks without copy-pasting YAML.

Parent Stack (app-stack.yaml)
โ”œโ”€โ”€ Resource: NetworkStack โ”€โ”€โ–บ nested-templates/vpc.yaml
โ”œโ”€โ”€ Resource: DatabaseStack โ”€โ”€โ–บ nested-templates/rds.yaml
โ”œโ”€โ”€ Resource: ComputeStack โ”€โ”€โ–บ nested-templates/asg.yaml
โ””โ”€โ”€ Outputs pulled from nested stacks via GetAtt

Updates to a parent stack propagate to nested stacks automatically during a parent update โ€” but a nested stack cannot be updated independently through the console without going through the parent. This trips people up during incident response: you canโ€™t just patch the child stack in isolation.

StackSets โ€” Deploying Across Accounts and Regions

StackSets exist for exactly one problem: you have the same template and you need it applied consistently across many accounts and regions, with drift tracked centrally. Think: a baseline CloudTrail configuration, a standard IAM role for break-glass access, or a mandatory security group ruleset that every account in the organization must have.

Management/Delegated Admin Account
โ”‚
โ”‚ StackSet: "org-baseline-security"
โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Deployment targets: entire OU or account โ”‚
โ”‚ list, across selected regions โ”‚
โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”˜
โ–ผ โ–ผ
Account A (us-east-1, eu-west-1) Account B (us-east-1)
Stack instance Stack instance

Two deployment models matter for the exam:

Concurrent deployment controls โ€” MaxConcurrentCount / MaxConcurrentPercentage and FailureToleranceCount / FailureTolerancePercentage govern how fast a StackSet rolls out and how many failures are tolerated before it stops. This is the same โ€œblast radius controlโ€ philosophy as CodeDeploy traffic shifting, just applied to infrastructure changes across accounts instead of application traffic across instances. Expect a question that asks you to prevent a bad template from being applied to all 200 accounts in an organization simultaneously โ€” the answer is tuning these tolerance settings, not a manual approval step (StackSets doesnโ€™t have one natively; that gate lives in the pipeline that triggers the StackSet update).

Custom Resources

When CloudFormation doesnโ€™t natively support something โ€” registering a third-party SaaS webhook, looking up an AMI ID dynamically, or running a one-time data migration โ€” a custom resource backs the resource with a Lambda function (or, less commonly now, an SNS topic). The Lambda receives a Create/Update/Delete event from CloudFormation and must send a response signal back to a pre-signed S3 URL. The single most common real-world bug here โ€” and a favorite exam trap โ€” is a custom resource Lambda that fails silently and never sends a response, leaving the stack stuck in CREATE_IN_PROGRESS for the full one-hour timeout. Always wrap custom resource logic in the cfn-response module or the newer Provider framework in CDK, which handles this for you.

Drift Detection

Drift detection compares the live state of stack resources against the templateโ€™s expected state and flags anything changed out-of-band (a security group rule added manually in the console, an S3 bucket policy edited directly). Itโ€™s not continuous โ€” you trigger it on a stack or a StackSet, and itโ€™s not automatic remediation, just detection and reporting. Pair it with EventBridge (Configโ€™s ConfigurationItemChangeNotification or scheduled drift detection runs) if a scenario wants proactive alerting rather than manual checks.


AWS CDK โ€” Concepts Youโ€™re Expected to Know

DOP-C02 wonโ€™t ask you to write TypeScript, but it will test whether you understand CDKโ€™s relationship to CloudFormation. CDK is not a replacement for CloudFormation โ€” itโ€™s a code-first authoring layer that synthesizes CloudFormation templates. When you run cdk deploy, CDK synthesizes your app into one or more CloudFormation templates and asset bundles, then hands them to CloudFormation to actually provision. Every CDK deployment is still a CloudFormation stack under the hood, which means all the StackSets, drift detection, and change set behavior you just learned still applies.

Key vocabulary:

CDK Pipelines (a construct library) automates the self-mutating pipeline pattern: a CDK app can define its own CodePipeline, and when the pipeline definition itself changes, the pipeline updates itself before deploying the rest of the application stacks. This is a subtle but testable point โ€” the pipeline is infrastructure too, and it deploys itself first.


Systems Manager for Fleet Configuration Management

Once you have more than a few dozen instances, โ€œSSH in and fix itโ€ stops being a strategy. Systems Manager (SSM) is the fleet-wide control plane, and DOP-C02 tests several of its capabilities specifically:

SSM CapabilityWhat it solves
Run CommandExecute ad-hoc or scheduled commands across a fleet without SSH/RDP access
State ManagerContinuously enforce a desired configuration (e.g., ensure an antivirus agent is always running) on a schedule
Patch ManagerAutomated OS/application patching with maintenance windows and patch baselines
AutomationRunbook-style multi-step operational workflows (documents), often triggered by EventBridge
Parameter StoreHierarchical config and secret storage, referenced directly from CloudFormation and CodeBuild
Session ManagerShell access to instances without opening SSH ports or managing bastion hosts
Fleet ManagerConsole-based fleet inventory and management view
InventoryCollects metadata (installed packages, running services) across the fleet for compliance querying

Maintenance Windows let you schedule Patch Manager or Run Command tasks during defined low-traffic periods, with concurrency and error-threshold controls that mirror the same blast-radius philosophy you saw in StackSets and CodeDeploy โ€” patch 10% of instances, stop if too many fail health checks.

The SSM Agent needs to be installed and an instance profile with the right managed policy (AmazonSSMManagedInstanceCore) attached for any of this to work โ€” a detail that shows up in troubleshooting-style questions (โ€œinstances arenโ€™t appearing in Fleet Managerโ€).


Immutable vs. Mutable Infrastructure

MUTABLE PATTERN IMMUTABLE PATTERN
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Instance launches once New AMI/image built per release
โ”‚ โ”‚
โ–ผ โ–ผ
Config management agent New ASG/task definition launched
(Ansible/Chef/Puppet/SSM) โ”‚
โ”‚ applies changes in place โ–ผ
โ–ผ Old instances terminated after
Instance drifts over time cutover (blue/green)
if changes aren't tracked

The professional exam has a clear bias: immutable infrastructure is the preferred pattern for production workloads because it eliminates configuration drift entirely โ€” you never patch a running instance, you replace it with one built from a known-good image. Mutable patterns (SSM State Manager enforcing config on long-lived instances) are still valid, particularly for stateful or legacy fleets where rebuilding isnโ€™t practical, but expect the โ€œbest practiceโ€ answer to lean immutable whenever the scenario allows it.


Golden AMI Pipelines with EC2 Image Builder

EC2 Image Builder formalizes the โ€œgolden AMIโ€ pattern that used to require hand-rolled Packer scripts and cron jobs. The pipeline:

Source Image (base AMI) โ”€โ”€โ–บ Build Component(s) โ”€โ”€โ–บ Test Component(s) โ”€โ”€โ–บ Distribution
(Amazon Linux 2023) - install agent - vulnerability scan - copy AMI to
- apply hardening - smoke test multiple regions
- bake app runtime - share to other
accounts/OUs

Image Builder runs on a schedule (or triggered by a new base AMI release via EventBridge) and produces a versioned, tested AMI automatically โ€” no more โ€œwho built this AMI and whatโ€™s on itโ€ archaeology. Distribution settings push the finished AMI to every region and account that needs it in one pipeline run, which is exactly the multi-account concern this whole step keeps circling back to.

Pair this with Auto Scaling Group instance refresh: once a new golden AMI is published, an instance refresh gradually replaces running instances with new ones launched from the updated AMI, respecting a minimum healthy percentage โ€” again, the same controlled-blast-radius rollout pattern, just applied at the AMI layer.


Exam Focus: What Questions Test From This Step