Lambda Concurrency: Reserved, Provisioned, and How to Avoid Throttling
Lambda concurrency is the number of function instances handling requests at the same time. When your function receives 1 request, 1 execution environment is active. When it receives 500 simultaneous requests, Lambda runs 500 environments in parallel — provided the limits allow it.
Understanding concurrency limits is critical. Exceed them and your function starts returning 429 TooManyRequestsException errors. Misconfigure reserved concurrency and you can accidentally throttle your own critical functions.
Account-Level Concurrency Limit
Each AWS account has a regional concurrency limit — defaulting to 1,000 concurrent executions. This is a shared pool across all Lambda functions in that region.
Account concurrency pool (default 1,000):┌────────────────────────────────────────────────────────────────┐│ 1,000 concurrent executions ││ ││ ┌───────────────┐ ┌───────────────┐ ┌───────────────────┐ ││ │ api-handler │ │ image-resizer │ │ data-pipeline │ ││ │ (300 used) │ │ (200 used) │ │ (400 used) │ ││ └───────────────┘ └───────────────┘ └───────────────────┘ ││ ││ Unreserved pool remaining: 100 │└────────────────────────────────────────────────────────────────┘If data-pipeline spikes to 800 concurrent executions and api-handler needs 300, that’s 1,100 — the API handler starts getting throttled because the total exceeds 1,000.
You can request a limit increase through AWS Service Quotas. There is no hard cap on how high you can go, but approval takes time.
The Three Concurrency Types
Unreserved Concurrency
Every function starts with unreserved concurrency — it pulls from the account pool without any dedicated allocation. Functions with unreserved concurrency compete for whatever the pool has left after reserved functions take their share.
Unreserved concurrency is the default. There is nothing to configure; it just means “use whatever is available.”
Problem with unreserved concurrency: A noisy function (batch job, runaway recursion, traffic spike) can consume most of the pool and starve other functions. In a multi-function account, this is a real failure mode.
# Monitor concurrent executions across functionsaws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name ConcurrentExecutions \ --statistics Maximum \ --period 60 \ --start-time 2024-01-15T00:00:00Z \ --end-time 2024-01-15T01:00:00ZReserved Concurrency
Reserved concurrency dedicates a fixed number of concurrent executions to a specific function, carved out of the account pool.
Account pool: 1,000
Reserved allocations: payment-processor: 100 reserved → always has at least 100 slots auth-service: 50 reserved → always has at least 50 slots
Unreserved pool: 850 (shared by everything else)Reserved concurrency serves two purposes:
Guarantee: The function always has capacity up to its reserved value, even if the rest of the account pool is saturated.
Cap: The function can never exceed its reserved value. This is important for functions that open database connections — if your database allows 100 connections and Lambda could spawn 1,000 concurrent environments, you can deadlock the database. Setting reserved concurrency to 80 prevents that.
# Set reserved concurrencyaws lambda put-function-concurrency \ --function-name payment-processor \ --reserved-concurrent-executions 100
# Check current settingaws lambda get-function-concurrency \ --function-name payment-processor
# Remove reserved concurrency (return to unreserved pool)aws lambda delete-function-concurrency \ --function-name payment-processorSetting reserved concurrency to 0 effectively disables a function. It can receive events but all invocations throttle immediately. Useful for emergency shutoffs.
Provisioned Concurrency
Provisioned concurrency pre-initialises execution environments so they are ready before any requests arrive. The environments complete the INIT phase (runtime startup, init code) and sit in a warm state.
Without provisioned concurrency: Request 1 → cold start (300ms init + 50ms handler) = 350ms Request 2 → warm (50ms handler) = 50ms Request 3 → warm (50ms handler) = 50ms
With provisioned concurrency (5 pre-warmed): Request 1 → warm (50ms handler) = 50ms Request 2 → warm (50ms handler) = 50ms Request 6 → exceeds provisioned, cold start = 350msThe first 5 requests hit pre-warmed environments. The 6th request creates a new environment on-demand (cold start), but the 6th environment remains warm for subsequent requests.
# Create a function version (required for provisioned concurrency)VERSION=$(aws lambda publish-version \ --function-name api-handler \ --query 'Version' --output text)
# Enable provisioned concurrency on that versionaws lambda put-provisioned-concurrency-config \ --function-name api-handler \ --qualifier $VERSION \ --provisioned-concurrent-executions 10You can also configure provisioned concurrency on an alias instead of a version, which is the recommended approach for production:
# Create an alias pointing to the versionaws lambda create-alias \ --function-name api-handler \ --name prod \ --function-version $VERSION
# Set provisioned concurrency on the aliasaws lambda put-provisioned-concurrency-config \ --function-name api-handler \ --qualifier prod \ --provisioned-concurrent-executions 10Cost: Provisioned concurrency has its own billing dimension — you pay per GB-second of provisioned concurrency time, even if those environments are idle. At 10 environments × 256 MB × 24 hours, this adds a meaningful cost. Only use provisioned concurrency where cold start latency causes user-facing problems.
Scaling Behaviour
Lambda’s burst limit controls how fast concurrency can increase:
- Initial burst: 500–3,000 immediately available (varies by region)
- After initial burst: +500 per minute until the account limit is reached
Concurrency growth after sudden spike: Minute 0: 0 → 3,000 (initial burst, us-east-1) Minute 1: 3,000 → 3,500 Minute 2: 3,500 → 4,000 ...until account limitFor traffic spikes that need more than the initial burst, Lambda will throttle until the per-minute allocation catches up.
Throttling: What Happens and How to Handle It
When Lambda throttles a request, the behaviour depends on the invocation type:
Synchronous (API Gateway, direct SDK):
- Returns
429 TooManyRequestsException - Caller receives the error immediately
- No automatic retry by Lambda
Asynchronous (S3, SNS, EventBridge):
- Lambda retries for up to 6 hours with exponential backoff
- If the function is still throttled after 6 hours, the event is dropped
- Configure a dead-letter queue to capture dropped events
Polling (SQS, Kinesis, DynamoDB Streams):
- Lambda pauses polling until concurrency is available
- For Kinesis and DynamoDB Streams, processing falls behind
- For SQS, messages stay in the queue (up to the visibility timeout)
# Handling throttling in client codeimport boto3from botocore.exceptions import ClientErrorimport time
lambda_client = boto3.client('lambda')
def invoke_with_retry(function_name, payload, max_retries=3): for attempt in range(max_retries): try: return lambda_client.invoke( FunctionName=function_name, Payload=payload ) except ClientError as e: if e.response['Error']['Code'] == 'TooManyRequestsException': wait = 2 ** attempt print(f"Throttled, waiting {wait}s (attempt {attempt+1})") time.sleep(wait) else: raise raise Exception(f"Failed after {max_retries} retries")Auto Scaling Provisioned Concurrency
Application Auto Scaling can automatically adjust provisioned concurrency based on metrics:
# Register scalable targetaws application-autoscaling register-scalable-target \ --service-namespace lambda \ --resource-id function:api-handler:prod \ --scalable-dimension lambda:function:ProvisionedConcurrency \ --min-capacity 2 \ --max-capacity 20
# Create target tracking policy (scale when utilisation > 70%)aws application-autoscaling put-scaling-policy \ --service-namespace lambda \ --resource-id function:api-handler:prod \ --scalable-dimension lambda:function:ProvisionedConcurrency \ --policy-name pc-utilisation-policy \ --policy-type TargetTrackingScaling \ --target-tracking-scaling-policy-configuration '{ "TargetValue": 0.7, "PredefinedMetricSpecification": { "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization" } }'This scales provisioned concurrency up when more than 70% of provisioned environments are actively handling requests.
Concurrency Limits and Database Connections
The most practical reason to set reserved concurrency is protecting downstream systems that have connection limits:
RDS database max connections: 100
Without reserved concurrency: Lambda scales to 500 concurrent → 500 DB connections → database rejects connections
With reserved concurrency = 80: Lambda capped at 80 concurrent → 80 DB connections → database stays healthy
Better solution: RDS Proxy Lambda scales freely → RDS Proxy pools connections → database sees ~20 connectionsRDS Proxy is the recommended solution for Lambda-to-RDS connectivity because it handles connection pooling, eliminating the need to limit Lambda concurrency purely for connection management.
Common Interview Questions
Q: What is the default concurrency limit per region? 1,000 concurrent executions per region per account by default. This can be increased via a Service Quota request.
Q: What is the difference between reserved and provisioned concurrency? Reserved concurrency dedicates a number of slots to a function and caps its maximum concurrent executions. Provisioned concurrency pre-warms a number of execution environments so they are ready without a cold start. They serve different purposes: reserved is for capacity isolation and protection; provisioned is for latency.
Q: If you set reserved concurrency to 50, what happens when the 51st request arrives? The 51st request is throttled with a 429 error. Reserved concurrency acts as both a minimum guarantee and a maximum cap.
Q: When should you use provisioned concurrency? When your function is synchronously invoked (user is waiting), cold starts add noticeable latency, and the function runs frequently enough that warm environments from prior invocations are not reliably available. Typically used for customer-facing API handlers with strict p99 latency requirements.