0
0
AirflowHow-ToBeginner ยท 3 min read

How to Disable Catchup in Airflow: Simple Guide

To disable catchup in Airflow, set catchup=False in your DAG definition. This stops Airflow from running past scheduled intervals that were missed when the scheduler was down or the DAG was paused.
๐Ÿ“

Syntax

The catchup parameter is a boolean option in the DAG constructor. It controls whether Airflow runs all missed DAG runs between the last execution and the current time.

  • catchup=True: Airflow runs all missed intervals.
  • catchup=False: Airflow skips missed intervals and only runs the latest scheduled run.
python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG(
    dag_id='example_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    catchup=False  # Disable catchup
)

task = DummyOperator(task_id='dummy_task', dag=dag)
๐Ÿ’ป

Example

This example shows a DAG with catchup=False. When you start this DAG after missing several scheduled runs, Airflow will only run the latest scheduled run and skip the older ones.

python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG(
    dag_id='no_catchup_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    catchup=False
)

task = DummyOperator(task_id='start', dag=dag)
Output
When triggered after a delay, only the latest scheduled run executes; no backfill runs occur.
โš ๏ธ

Common Pitfalls

One common mistake is forgetting to set catchup=False, which causes Airflow to run all missed DAG runs, potentially overloading your system.

Another pitfall is setting start_date incorrectly, which can cause unexpected behavior with scheduling and catchup.

python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

# Wrong: catchup is True by default, causing backfill
wrong_dag = DAG(
    dag_id='wrong_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily'
)

# Right: catchup disabled to skip missed runs
right_dag = DAG(
    dag_id='right_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    catchup=False
)
๐Ÿ“Š

Quick Reference

ParameterDescriptionDefault Value
catchupRun all missed DAG runs between last and current scheduleTrue
start_dateDate from which DAG scheduling startsRequired
schedule_intervalHow often the DAG runs'@daily' or cron expression
โœ…

Key Takeaways

Set catchup=False in your DAG to disable running missed intervals.
Without disabling catchup, Airflow runs all missed DAG runs, which can overload your system.
Always set a proper start_date to control when your DAG begins scheduling.
Use catchup=False for DAGs where only the latest run matters, like daily reports.
Remember catchup defaults to True, so explicitly disable it if needed.