0
0
Apache Airflowdevops~10 mins

Catchup and backfill behavior in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Catchup and backfill behavior
DAG Start Date
Scheduler checks DAG runs
Are there missing runs before today?
NoRun only current schedule
Yes
Catchup enabled?
NoRun only current schedule
Yes
Scheduler creates DAG runs for missing intervals
Runs execute in order
Backfill command?
NoContinue normal scheduling
Yes
Backfill creates DAG runs for specified intervals
Runs execute immediately
Backfill complete
This flow shows how Airflow decides to run past DAG runs (catchup) and how manual backfill triggers immediate runs for past intervals.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime, timedelta

dag = DAG('example_dag', start_date=datetime(2024,1,1), schedule_interval='@daily', catchup=True)
run_task = DummyOperator(task_id='run', dag=dag)
Defines a DAG starting Jan 1, 2024, scheduled daily with catchup enabled, so missed runs will be created and executed.
Process Table
StepDate CheckedCatchup Enabled?Action TakenDAG Runs CreatedRuns Executed
12024-01-01TrueCreate DAG run for 2024-01-012024-01-01Run for 2024-01-01 executed
22024-01-02TrueCreate DAG run for 2024-01-022024-01-02Run for 2024-01-02 executed
32024-01-03TrueCreate DAG run for 2024-01-032024-01-03Run for 2024-01-03 executed
42024-01-04TrueNo missing runs, run current scheduleNoneRun for 2024-01-04 executed
5Manual BackfillN/ABackfill command for 2023-12-29 to 2023-12-312023-12-29, 2023-12-30, 2023-12-31Runs for 2023-12-29 to 2023-12-31 executed immediately
62024-01-05TrueNo missing runs, run current scheduleNoneRun for 2024-01-05 executed
💡 No more missing runs and backfill completed, scheduler continues normal operation.
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5After Step 6
DAG Runs CreatedNone2024-01-012024-01-022024-01-03None2023-12-29, 2023-12-30, 2023-12-31None
Runs ExecutedNone2024-01-012024-01-022024-01-032024-01-042023-12-29, 2023-12-30, 2023-12-312024-01-05
Key Moments - 3 Insights
Why does Airflow create DAG runs for past dates when catchup is True?
Because catchup=True tells Airflow to create and run DAG runs for all scheduled intervals between the start_date and the current date if they were missed. See execution_table rows 1-3.
What happens if catchup is set to False?
Airflow skips creating DAG runs for past intervals and only runs the DAG for the current scheduled interval. This is shown in execution_table step 4 where no past runs are created.
How is backfill different from catchup?
Backfill is a manual command to run DAG runs for specific past intervals immediately, regardless of catchup setting. See execution_table step 5 where backfill runs execute right away.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 3. What DAG run date is created and executed?
A2024-01-04
B2024-01-03
C2023-12-31
D2024-01-05
💡 Hint
Check the 'DAG Runs Created' and 'Runs Executed' columns at step 3.
At which step does the backfill command run DAG runs for past dates?
AStep 5
BStep 4
CStep 2
DStep 6
💡 Hint
Look for the row mentioning 'Manual Backfill' in the 'Date Checked' column.
If catchup was set to False, which step would NOT create DAG runs for past dates?
AStep 5
BStep 1
CStep 3
DStep 4
💡 Hint
Refer to the 'Catchup Enabled?' and 'Action Taken' columns in the execution_table.
Concept Snapshot
Airflow catchup runs missed DAG intervals from start_date to now.
Set catchup=False to skip past runs and run only current.
Backfill is manual, runs past intervals immediately.
Catchup runs are scheduled; backfill runs are manual.
Both create DAG runs for past dates but differ in timing.
Full Transcript
This visual execution shows how Airflow handles catchup and backfill. When catchup is enabled, Airflow creates DAG runs for all missed scheduled intervals from the start date up to the current date, running them in order. If catchup is disabled, Airflow skips these past runs and only runs the current schedule. Backfill is a manual command that creates DAG runs for specified past intervals and runs them immediately, regardless of catchup settings. The execution table traces each step, showing when DAG runs are created and executed, helping beginners understand the difference between automatic catchup and manual backfill.