Step 2 — Configuration Management & Infrastructure as Code

Ask ten engineers what “infrastructure as code” means and you’ll get ten answers that all stop short of what this exam wants. A single template that provisions a VPC is not the skill being tested here. The skill being tested is: can you manage infrastructure as code across dozens of accounts, keep it from drifting, and roll changes out without someone fat-fingering a stack update in production. Let’s get into it.

CloudFormation Beyond the Basics

You already know the fundamentals — templates, resources, parameters, outputs. At the professional level, the exam cares about how templates compose and fail at scale.

Nested Stacks

A nested stack is a stack created as a resource inside a parent stack (AWS::CloudFormation::Stack), pointing at a child template stored in S3. The reason to use them isn’t aesthetics — it’s the 500-resource-per-stack limit and the desire to reuse common building blocks (a standard VPC, a standard logging setup) across many top-level stacks without copy-pasting YAML.

Parent Stack (app-stack.yaml)
  ├── Resource: NetworkStack  ──► nested-templates/vpc.yaml
  ├── Resource: DatabaseStack ──► nested-templates/rds.yaml
  ├── Resource: ComputeStack  ──► nested-templates/asg.yaml
  └── Outputs pulled from nested stacks via GetAtt

Updates to a parent stack propagate to nested stacks automatically during a parent update — but a nested stack cannot be updated independently through the console without going through the parent. This trips people up during incident response: you can’t just patch the child stack in isolation.

StackSets — Deploying Across Accounts and Regions

StackSets exist for exactly one problem: you have the same template and you need it applied consistently across many accounts and regions, with drift tracked centrally. Think: a baseline CloudTrail configuration, a standard IAM role for break-glass access, or a mandatory security group ruleset that every account in the organization must have.

Management/Delegated Admin Account
        │
        │  StackSet: "org-baseline-security"
        │
   ┌────┴─────────────────────────────────────┐
   │  Deployment targets: entire OU or account │
   │  list, across selected regions             │
   └────┬─────────────────────────────────────┬─┘
        ▼                                     ▼
  Account A (us-east-1, eu-west-1)     Account B (us-east-1)
   Stack instance                       Stack instance

Two deployment models matter for the exam:

Self-managed permissions — you manually create AWSCloudFormationStackSetAdministrationRole and AWSCloudFormationStackSetExecutionRole in each account, with a trust relationship between them. Older pattern, more control, more setup.
Service-managed permissions — StackSets integrates directly with AWS Organizations. Deploy to an entire OU, and new accounts that join the OU automatically get the stack instance. This is the pattern to reach for whenever a scenario mentions “new accounts should automatically receive baseline resources.”

Concurrent deployment controls — MaxConcurrentCount / MaxConcurrentPercentage and FailureToleranceCount / FailureTolerancePercentage govern how fast a StackSet rolls out and how many failures are tolerated before it stops. This is the same “blast radius control” philosophy as CodeDeploy traffic shifting, just applied to infrastructure changes across accounts instead of application traffic across instances. Expect a question that asks you to prevent a bad template from being applied to all 200 accounts in an organization simultaneously — the answer is tuning these tolerance settings, not a manual approval step (StackSets doesn’t have one natively; that gate lives in the pipeline that triggers the StackSet update).

Custom Resources

When CloudFormation doesn’t natively support something — registering a third-party SaaS webhook, looking up an AMI ID dynamically, or running a one-time data migration — a custom resource backs the resource with a Lambda function (or, less commonly now, an SNS topic). The Lambda receives a Create/Update/Delete event from CloudFormation and must send a response signal back to a pre-signed S3 URL. The single most common real-world bug here — and a favorite exam trap — is a custom resource Lambda that fails silently and never sends a response, leaving the stack stuck in CREATE_IN_PROGRESS for the full one-hour timeout. Always wrap custom resource logic in the cfn-response module or the newer Provider framework in CDK, which handles this for you.

Drift Detection

Drift detection compares the live state of stack resources against the template’s expected state and flags anything changed out-of-band (a security group rule added manually in the console, an S3 bucket policy edited directly). It’s not continuous — you trigger it on a stack or a StackSet, and it’s not automatic remediation, just detection and reporting. Pair it with EventBridge (Config’s ConfigurationItemChangeNotification or scheduled drift detection runs) if a scenario wants proactive alerting rather than manual checks.

AWS CDK — Concepts You’re Expected to Know

DOP-C02 won’t ask you to write TypeScript, but it will test whether you understand CDK’s relationship to CloudFormation. CDK is not a replacement for CloudFormation — it’s a code-first authoring layer that synthesizes CloudFormation templates. When you run cdk deploy, CDK synthesizes your app into one or more CloudFormation templates and asset bundles, then hands them to CloudFormation to actually provision. Every CDK deployment is still a CloudFormation stack under the hood, which means all the StackSets, drift detection, and change set behavior you just learned still applies.

Key vocabulary:

Construct — the basic building block; L1 constructs map 1:1 to CloudFormation resources, L2 constructs add sane defaults and convenience methods, L3 constructs (“patterns”) compose multiple resources into a common architecture (e.g., an ALB-fronted Fargate service in a handful of lines).
Stack — a unit of deployment, same concept as a CloudFormation stack, just defined in code.
App — the root construct that can contain multiple stacks, potentially targeting different accounts/regions.
cdk synth / cdk diff / cdk deploy — synth previews the generated template, diff compares it against what’s deployed, deploy applies it. Expect exam scenarios where “review infrastructure changes before applying them in a pipeline” maps to running cdk diff (or an equivalent CloudFormation change set) as a pipeline stage before the deploy action.

CDK Pipelines (a construct library) automates the self-mutating pipeline pattern: a CDK app can define its own CodePipeline, and when the pipeline definition itself changes, the pipeline updates itself before deploying the rest of the application stacks. This is a subtle but testable point — the pipeline is infrastructure too, and it deploys itself first.

Systems Manager for Fleet Configuration Management

Once you have more than a few dozen instances, “SSH in and fix it” stops being a strategy. Systems Manager (SSM) is the fleet-wide control plane, and DOP-C02 tests several of its capabilities specifically:

SSM Capability	What it solves
Run Command	Execute ad-hoc or scheduled commands across a fleet without SSH/RDP access
State Manager	Continuously enforce a desired configuration (e.g., ensure an antivirus agent is always running) on a schedule
Patch Manager	Automated OS/application patching with maintenance windows and patch baselines
Automation	Runbook-style multi-step operational workflows (documents), often triggered by EventBridge
Parameter Store	Hierarchical config and secret storage, referenced directly from CloudFormation and CodeBuild
Session Manager	Shell access to instances without opening SSH ports or managing bastion hosts
Fleet Manager	Console-based fleet inventory and management view
Inventory	Collects metadata (installed packages, running services) across the fleet for compliance querying

Maintenance Windows let you schedule Patch Manager or Run Command tasks during defined low-traffic periods, with concurrency and error-threshold controls that mirror the same blast-radius philosophy you saw in StackSets and CodeDeploy — patch 10% of instances, stop if too many fail health checks.

The SSM Agent needs to be installed and an instance profile with the right managed policy (AmazonSSMManagedInstanceCore) attached for any of this to work — a detail that shows up in troubleshooting-style questions (“instances aren’t appearing in Fleet Manager”).

Immutable vs. Mutable Infrastructure

MUTABLE PATTERN                          IMMUTABLE PATTERN
────────────────────                     ────────────────────
Instance launches once                   New AMI/image built per release
   │                                        │
   ▼                                        ▼
Config management agent                  New ASG/task definition launched
(Ansible/Chef/Puppet/SSM)                   │
   │  applies changes in place              ▼
   ▼                                     Old instances terminated after
Instance drifts over time                cutover (blue/green)
if changes aren't tracked

The professional exam has a clear bias: immutable infrastructure is the preferred pattern for production workloads because it eliminates configuration drift entirely — you never patch a running instance, you replace it with one built from a known-good image. Mutable patterns (SSM State Manager enforcing config on long-lived instances) are still valid, particularly for stateful or legacy fleets where rebuilding isn’t practical, but expect the “best practice” answer to lean immutable whenever the scenario allows it.

Golden AMI Pipelines with EC2 Image Builder

EC2 Image Builder formalizes the “golden AMI” pattern that used to require hand-rolled Packer scripts and cron jobs. The pipeline:

Source Image (base AMI) ──► Build Component(s) ──► Test Component(s) ──► Distribution
  (Amazon Linux 2023)          - install agent          - vulnerability scan     - copy AMI to
                                - apply hardening        - smoke test              multiple regions
                                - bake app runtime                                - share to other
                                                                                    accounts/OUs

Image Builder runs on a schedule (or triggered by a new base AMI release via EventBridge) and produces a versioned, tested AMI automatically — no more “who built this AMI and what’s on it” archaeology. Distribution settings push the finished AMI to every region and account that needs it in one pipeline run, which is exactly the multi-account concern this whole step keeps circling back to.

Pair this with Auto Scaling Group instance refresh: once a new golden AMI is published, an instance refresh gradually replaces running instances with new ones launched from the updated AMI, respecting a minimum healthy percentage — again, the same controlled-blast-radius rollout pattern, just applied at the AMI layer.

Exam Focus: What Questions Test From This Step

When to use nested stacks (resource limit, reuse) versus StackSets (cross-account/region consistency)
Service-managed vs. self-managed StackSet permissions, and which one auto-applies to new OU member accounts
StackSet failure tolerance and concurrency settings as the mechanism for limiting blast radius of a bad template
Custom resource Lambda behavior — the response-signal requirement and what happens if it’s omitted
Drift detection as a detection-only tool, not automated remediation
CDK’s relationship to CloudFormation — constructs synthesize templates, cdk diff for pre-deploy review, self-mutating CDK Pipelines
Matching an SSM capability (Run Command, State Manager, Patch Manager, Automation, Session Manager) to a fleet management scenario
Immutable infrastructure as the preferred professional-level pattern, and when mutable configuration management is still the right call
Golden AMI pipelines via EC2 Image Builder combined with ASG instance refresh for fleet-wide rollout

Written by NPBlue Cloud Team — Cloud & Platform Engineers who runs production workloads on AWS daily and writes from real deployment experience, not the docs alone.

Reviewed for technical accuracy. Spot an error? Let us know.