Apache Airflow vs AWS Step Functions: Choosing the Right Orchestrator
Both Apache Airflow and AWS Step Functions orchestrate workflows. Both have scheduling, dependency management, and monitoring. Neither is a clear winner across all use cases — they have different mental models, different operational costs, and different strengths. Picking the wrong one means fighting the tool instead of building the system.
This comparison focuses on the practical differences that matter in real projects.
Mental Model Comparison
The fundamental difference is this: Airflow is a scheduler that runs tasks on a timetable. Step Functions is an event-driven state machine that coordinates services.
Airflow DAGs (Directed Acyclic Graphs) are Python code. You define tasks, their dependencies, and when they run. Airflow executes them on workers and tracks the state. Everything is centralised in the Airflow scheduler.
Step Functions workflows are JSON-defined state machines. Each state calls an AWS service (Lambda, DynamoDB, ECS, Glue, SageMaker). State transitions happen in response to success or failure. There is no scheduler running your tasks — AWS services execute the work; Step Functions just coordinates them.
Apache Airflow | +-- Python DAG defines tasks and dependencies +-- Scheduler decides when to run each task +-- Workers execute the tasks (Celery / Kubernetes) +-- Central state tracking in metadata DB +-- You manage workers, scheduler, metadata DB
AWS Step Functions | +-- JSON ASL defines states and transitions +-- Event (API call, EventBridge, SDK) starts execution +-- AWS services do the actual work (Lambda, Glue, ECS...) +-- Step Functions tracks execution state (fully managed) +-- No infrastructure to manageWhen Airflow Wins
Python-native data pipelines: If your pipeline is mostly Python — running dbt models, calling pandas transforms, executing SQL scripts — Airflow is natural. The DAG is Python, operators are Python classes, and the entire ecosystem is Python. Writing equivalent logic in Step Functions requires wrapping everything in Lambda functions.
Multi-cloud or on-premises integration: Airflow has hundreds of providers (operators and hooks) for external systems — Snowflake, BigQuery, Databricks, dbt, PostgreSQL, SFTP, Slack. Step Functions is AWS-native. Connecting to non-AWS systems from Step Functions requires a Lambda function for each integration.
Complex scheduling logic: Airflow’s scheduling model supports cron expressions, data intervals, backfill, catchup, and timezone-aware scheduling. Step Functions relies on EventBridge for scheduling and has no native backfill concept. If your pipeline needs to reprocess 90 days of historical data on the same schedule, Airflow does this cleanly with catchup=True. In Step Functions, you write the reprocessing logic yourself.
DAG-level dependency visibility: Airflow’s graph view shows the full dependency graph of a DAG at a glance, colour-coded by status. This is invaluable for debugging complex multi-task pipelines. Step Functions has a similar visual, but it is execution-by-execution rather than providing a summary of all recent runs at once.
Airflow DAG View (summary of last 20 runs per task)+----------------+--------+--------+--------+--------+| Task | Run 1 | Run 2 | Run 3 | Run 4 |+----------------+--------+--------+--------+--------+| extract_data | green | green | green | green || transform | green | green | red | green || load_redshift | green | green | skip | green |+----------------+--------+--------+--------+--------+ (red = failed, green = success, skip = skipped)Large teams with data engineering expertise: Airflow’s Python-first model is familiar to data engineers. Code review, testing DAGs, and version controlling workflows all fit the normal software development workflow.
When Step Functions Wins
AWS-native service orchestration: If your workflow consists of AWS services — trigger a Glue job, wait for it, read from DynamoDB, invoke a Lambda, push to SQS — Step Functions does this without any wrapper code. Direct SDK integrations call AWS services natively. Airflow can do this too (using AWS operators), but Step Functions is architecturally tighter and requires less glue code.
Event-driven workflows: Step Functions executes when triggered by an event — an API Gateway request, an EventBridge rule, an S3 upload, an SQS message. The workflow instance exists per-execution and terminates when done. Airflow is schedule-driven at its core; event-driven patterns require workarounds like sensor tasks that poll for conditions.
Zero operational overhead: Airflow needs a scheduler, workers, and a metadata database. On AWS, you can use MWAA (Managed Workflows for Apache Airflow), which manages the infrastructure, but you still pay for always-on compute. Step Functions is fully serverless — there is no infrastructure to manage and you pay per state transition.
Long-running workflows with human steps: A Standard Step Functions workflow can pause for up to 1 year waiting for a human to approve something. Airflow has no native equivalent — long pauses require external state storage and a sensor task that polls for completion.
Microservices coordination: When an API request triggers a multi-step process (validate → enrich → store → notify), Step Functions handles this pattern naturally. Airflow is not designed for synchronous request handling.
Operational Overhead Comparison
Apache Airflow (self-managed) - Scheduler process: must be highly available - Workers: scale up/down based on task volume - Metadata DB: PostgreSQL, needs backups and maintenance - Webserver: for UI access - Log storage: centralised logging setup required - Upgrades: significant testing effort between major versions
Apache Airflow (MWAA - managed) - AWS manages scheduler, workers, webserver - You configure environment size (vCPUs, GB) - Always-on: you pay even when no DAGs are running - Startup/teardown: environment startup takes ~20 minutes
AWS Step Functions - Fully serverless - No infrastructure to manage - Pay per execution (Standard: per state transition) - Scales instantly to thousands of concurrent executions - No maintenance or upgrade cyclesThe operational cost difference is substantial for small teams. A minimum viable MWAA environment costs 360/month for a lightly loaded environment. Step Functions costs nothing when idle and pennies for typical workflows.
Pricing Comparison
| Apache Airflow (MWAA) | AWS Step Functions (Standard) | |
|---|---|---|
| Idle cost | $350+/month (min environment) | $0 |
| Cost model | Per environment-hour + worker-hours | Per state transition |
| 1M state transitions | N/A | $25 |
| 1,000 small workflows/day | MWAA environment cost dominates | ~$0.75/day |
| 10 long workflows/day | MWAA is more cost-effective | Very cheap |
Step Functions is almost always cheaper unless you are running large numbers of state transitions (>40 million/month, where you exceed the free tier meaningfully) or need the features that justify MWAA’s cost.
Hybrid Architecture
The two tools are not mutually exclusive. A common pattern:
- Airflow (MWAA) handles scheduled data pipeline orchestration — dbt runs, Redshift loads, data quality checks, Spark jobs submitted via EMR operators
- Step Functions handles event-driven service workflows — order processing triggered by API calls, document processing triggered by S3 uploads, ML inference workflows triggered by SageMaker events
User API request | v[Step Functions: order workflow] (event-driven, per-request) | vResult in DynamoDB
---
02:00 UTC cron | v[Airflow: nightly_pipeline DAG] (schedule-driven, batch) | vReports in RedshiftThe deciding question is usually: is this workflow triggered on a schedule or by an event? Schedule → Airflow. Event → Step Functions.
Interview Notes
Q: What is MWAA? MWAA (Managed Workflows for Apache Airflow) is AWS’s managed Airflow service. AWS handles the scheduler, workers, webserver, and metadata database. You provide the DAG files (stored in S3) and configure the environment size. It removes the operational burden of managing Airflow infrastructure but still has an always-on cost.
Q: Can Step Functions replace Airflow entirely? For pure AWS-native workflows and event-driven use cases, yes. For Python-native data pipelines, multi-cloud integrations, complex scheduling with backfill, and data engineering teams comfortable with Python, Airflow has genuine advantages Step Functions cannot replicate cleanly.
Q: How does Step Functions handle scheduling? Step Functions does not have a built-in scheduler. Use Amazon EventBridge Scheduler to trigger Step Functions executions on a cron schedule. EventBridge Scheduler is serverless and has no cost when idle.
Q: What is a DAG in Airflow? A Directed Acyclic Graph — a Python object that defines the tasks in a pipeline and the dependencies between them. “Acyclic” means there are no circular dependencies — every path through the graph terminates. Airflow parses DAG files, identifies which tasks need to run based on their schedule and dependencies, and submits them to workers for execution.