What is Catchup in Airflow: Explanation and Usage
catchup is a setting that controls whether missed scheduled runs should be executed when the scheduler starts or recovers. If catchup=True, Airflow will run all past DAG runs that were missed; if catchup=False, it will only run the latest scheduled run and skip the past ones.How It Works
Imagine you have a daily task that should run every day at 8 AM. If your Airflow scheduler is down for a few days, those daily runs are missed. The catchup feature decides if Airflow should go back and run all those missed days once the scheduler is back up.
When catchup=True, Airflow acts like a diligent friend who wants to complete all the missed work, running every past scheduled task until it catches up to today. When catchup=False, Airflow behaves like a friend who only cares about the current day and skips all the missed ones, running only the latest scheduled task.
This helps control workload and resource use, especially if running all missed tasks would be too heavy or unnecessary.
Example
This example shows how to set catchup in a DAG to False, so Airflow skips past runs and only runs the latest scheduled task.
from airflow import DAG from airflow.operators.bash import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'start_date': datetime(2024, 4, 1), 'retries': 1, 'retry_delay': timedelta(minutes=5), } dag = DAG( 'example_catchup', default_args=default_args, schedule_interval='@daily', catchup=False # This disables catchup ) t1 = BashOperator( task_id='print_date', bash_command='date', dag=dag )
When to Use
Use catchup=True when you want to ensure every scheduled run happens, such as for critical data processing or reports that must not miss any day.
Use catchup=False when missed runs are not important or would cause unnecessary load, like in cases where only the latest data matters or when running all missed tasks would overwhelm your system.
For example, if you have a daily report that can be skipped if missed, set catchup=False. But if you have a billing process that must run for every day, keep catchup=True.
Key Points
- catchup=True runs all missed DAG runs to catch up.
- catchup=False runs only the latest scheduled DAG run.
- Setting catchup controls workload and resource use after downtime.
- Choose catchup based on whether missed runs are important for your workflow.