Terraform State Drift Detection

Drift happens when real infrastructure diverges from the state Terraform tracks — a security group rule added manually in the AWS console, an instance type changed via CLI, an auto-scaling event that modified a resource. Drift detection finds these gaps before they cause outages or security incidents.

What Drift Looks Like

# Someone manually added an inbound rule to a security group in the console.
# Terraform plan shows it will remove the rule:

  ~ resource "aws_security_group_rule" "app_ingress" {
      - cidr_blocks = ["203.0.113.100/32"]   # Manually added — Terraform will remove it
        from_port   = 443
        protocol    = "tcp"
        to_port     = 443
    }

Terraform doesn’t know the rule was added intentionally. If you run apply, it removes it.

Detecting Drift with terraform plan

The simplest drift check — compare current configuration against real infrastructure:

# Run plan with refresh (default) — queries providers for current state
terraform plan -refresh=true

# If the plan shows unexpected changes, those are drift:
# ~ resource modified outside Terraform
# + resource that exists in real infra but not in state (if using refresh)
# - resource in state but deleted in real infra

For a plan that only detects drift without checking configuration changes, use refresh-only:

# Show what the state would look like after syncing with real infra
# No configuration changes — only state reconciliation
terraform plan -refresh-only

Output:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform
since the last "terraform apply":

  ~ resource "aws_security_group" "app" {
      ~ ingress = [
          + {
              cidr_blocks      = ["203.0.113.100/32"]
              from_port        = 22
              to_port          = 22
              protocol         = "tcp"
              description      = ""
            },
          # ...
        ]
    }

This is a drift report. To update state to match the above, run:
  terraform apply -refresh-only

Accepting vs Rejecting Drift

Accept the drift (manual change was intentional):

# Update Terraform state to match the real-world change
terraform apply -refresh-only

# Then update the .tf config to codify the change so it's permanent

Reject the drift (manually changed resource should go back to config):

# Plan and apply normally — Terraform will restore the intended state
terraform plan
terraform apply

Automated Drift Detection Pipeline

Run drift detection on a schedule — don’t wait until someone notices:

name: Drift Detection

on:
  schedule:
    - cron: '0 6 * * *'   # Every day at 6 AM UTC
  workflow_dispatch:

jobs:
  detect-drift:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
      issues: write

    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.9.5"

      - name: Terraform Init
        run: terraform init -input=false

      - name: Detect Drift
        id: drift
        run: |
          terraform plan -refresh-only -detailed-exitcode -no-color 2>&1 | tee drift-report.txt
          echo "exit_code=${PIPESTATUS[0]}" >> $GITHUB_OUTPUT

      - name: Create Issue if Drift Found
        if: steps.drift.outputs.exit_code == '2'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs')
            const report = fs.readFileSync('drift-report.txt', 'utf8')
            await github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: `Infrastructure drift detected - ${new Date().toISOString().split('T')[0]}`,
              body: `## Drift Detection Report\n\n\`\`\`\n${report.slice(0, 60000)}\n\`\`\``,
              labels: ['infrastructure', 'drift']
            })

terraform plan -detailed-exitcode returns:

0 — no changes
1 — error
2 — changes detected (drift present)

Drift from Resource Replacement

Some infrastructure changes can’t be applied in-place — they require replacement. Drift from these can’t be captured with refresh-only:

# Find resources that exist in state but not in real infra
terraform plan -refresh=true 2>&1 | grep "must be replaced\|has been deleted"

# Example output:
# aws_instance.app must be replaced (instance was terminated externally)

Preventing Drift

The best drift is drift that never happens:

# 1. Restrict manual console access via IAM — require all changes through Terraform
# 2. Use lifecycle rules to catch unexpected changes
resource "aws_security_group" "app" {
  name = "app-sg"

  lifecycle {
    ignore_changes = [
      ingress,   # If your app auto-manages ingress rules, tell Terraform to ignore them
    ]
  }
}

# 3. Use Config Rules to detect out-of-band changes
resource "aws_config_config_rule" "required_tags" {
  name = "required-tags"
  source {
    owner             = "AWS"
    source_identifier = "REQUIRED_TAGS"
  }
  input_parameters = jsonencode({
    tag1Key = "Environment"
    tag2Key = "Team"
  })
}

Organizational controls that prevent drift:

Deny IAM permissions for humans to modify resources directly (require CI/CD pipelines)
Enforce SCPs (Service Control Policies) in AWS Organizations
Require GitOps workflows where all changes go through pull requests

Questions & Comments

Loading comments…