DynamoDB Streams: Real-Time Change Data Capture for Event-Driven Architectures
Applications often need to react when data changes — send a notification when an order is placed, update a search index when a product price changes, populate a reporting database when a new user registers. The naive approach is to poll the table repeatedly looking for changes. DynamoDB Streams offers something better: a reliable, ordered, time-limited log of every change made to a table.
What Streams Captures
When you enable DynamoDB Streams on a table, every write operation — PutItem, UpdateItem, DeleteItem, and transactional writes — generates a stream record. Stream records are:
- Ordered within a partition: changes to items with the same partition key appear in the order they happened
- Available for 24 hours: records stay in the stream for one day before they expire
- Organized into shards: the stream is divided into shards, similar to Kinesis Data Streams
Each stream record describes one item-level change. You configure how much data each record contains through the stream view type:
Stream View Types ==================
KEYS_ONLY Only the key attributes (PK and SK) of the changed item NEW_IMAGE The entire item as it looks after the change OLD_IMAGE The entire item as it looked before the change NEW_AND_OLD_IMAGES Both the before and after state of the item
Most useful view types: ├── NEW_IMAGE for building downstream indexes or search ├── OLD_IMAGE for audit logs tracking what was deleted └── NEW_AND_OLD_IMAGES for detecting what specifically changedArchitecture: Streams + Lambda
The most common pattern is connecting DynamoDB Streams to AWS Lambda. AWS manages the polling — Lambda polls the stream shards on your behalf, batches records together, and invokes your function when records are available. You do not write polling code; you write the handler.
DynamoDB Streams → Lambda Pattern ==================================
DynamoDB Table │ │ (every write creates a stream record) ▼ DynamoDB Stream (shards, 24-hour retention) │ │ (Lambda polls; AWS manages this connection) ▼ Lambda Function (event source mapping) │ ├──► Send SNS notification (order confirmation email) ├──► Update Elasticsearch/OpenSearch index (search sync) ├──► Write to Redshift (analytics pipeline) ├──► Update a DynamoDB aggregation table (counter update) └──► Trigger Step Functions workflow
Lambda invocation batches up to 10,000 records at once Batch size and parallelization factor are configurableThe event source mapping handles retry behavior: if your Lambda function fails, AWS retries the same batch. Items are processed in order within each shard. This ordering guarantee matters — if you have two updates to the same item, you want to process the first before the second.
Event-Driven Use Cases
Order processing pipeline: an e-commerce table has new orders written when customers check out. A Lambda function reads the stream, sends a confirmation email via SES, decrements inventory counts in another table, and puts a message on SQS for the warehouse fulfillment system. No code in the order write path is responsible for these downstream effects — the stream fans out the event.
Search index synchronization: a product catalog table is the system of record. A Lambda function processes stream records and writes updates to an OpenSearch index. The catalog team writes only to DynamoDB; search stays in sync automatically with under a second of lag.
Audit logging: a sensitive financial table has a Lambda function reading its stream with NEW_AND_OLD_IMAGES view type. Every change — what was there before, what it changed to, and the approximate timestamp — is written to an immutable audit log in S3 or CloudWatch Logs. No developer can delete this trail because it is written by the stream consumer, not the application making the changes.
Cross-table aggregations: a votes table records individual votes in a real-time poll. Rather than counting votes with an expensive Scan every time someone views results, a Lambda function processes the stream and maintains running totals in a separate aggregations table. Reads of the aggregations table are fast key lookups.
Streams vs Polling: Why It Matters
Without streams, reacting to changes requires polling:
Polling Approach (without streams) ===================================
Every 30 seconds: scan = dynamodb.scan(table, filter="updated_at > last_poll_time") for item in scan: process(item)
Problems: - Expensive (Scan reads entire table) - Requires a timestamp attribute on every item - Race conditions if updates happen between polls - Deleted items are invisible — you can't scan for deletions - Latency: up to 30 seconds between event and reaction
Streams Approach: - Lambda invoked within seconds of change - Captures inserts, updates, and deletes - No table schema modification required - Ordered delivery within partition - Cost: per shard-hour + Lambda invocationsEnabling Streams
Streams is an optional feature, disabled by default. Enable it on an existing table without any downtime:
aws dynamodb update-table \ --table-name Orders \ --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGESThen create the Lambda event source mapping:
aws lambda create-event-source-mapping \ --function-name OrderProcessor \ --event-source-arn arn:aws:dynamodb:us-east-1:123456789012:table/Orders/stream/2025-06-15T00:00:00.000 \ --starting-position LATEST \ --batch-size 100Error Handling and Ordering
A shard processes records for a subset of partition keys. If your Lambda function throws an exception, the batch retries until it succeeds or until the records expire from the stream (24 hours). This is important to design around: an unhandled error in a Lambda function can block all subsequent stream records for that shard until the failing batch is processed or expires.
Best practices:
- Wrap processing logic in try-catch and write failed records to a dead-letter queue (SQS) rather than letting exceptions propagate
- Design Lambda functions to be idempotent — the same record processed twice should produce the same result
- Monitor the
IteratorAgeCloudWatch metric — high iterator age means your Lambda function is falling behind the stream
Key Interview Points
- Streams are ordered within a partition, not across the whole table — if you need global ordering, you need a different design
- 24-hour retention is the only option — there is no way to extend it. If you need longer retention, read records into Kinesis Data Streams as they arrive
- DynamoDB Global Tables use Streams internally — enabling Global Tables automatically enables Streams; you cannot disable Streams on a Global Table
- Streams are not guaranteed exactly-once — under certain circumstances (shard splits, Lambda retries) you may see a record more than once. Idempotent processing is required.
- Lambda parallel factor: you can configure multiple simultaneous invocations per shard (up to 10) to increase throughput, but this can violate ordering within a shard
- The KEYS_ONLY view type is cheapest; NEW_AND_OLD_IMAGES is most expensive in terms of data written to the stream