Overview - Catchup and backfill behavior
What is it?
Catchup and backfill are features in Apache Airflow that control how missed or past scheduled tasks are handled. Catchup means Airflow will run all the past scheduled tasks that were not executed. Backfill is a manual process to run tasks for a specific past date range. These help keep data pipelines consistent even if the system was down or tasks failed.
Why it matters
Without catchup and backfill, missed tasks would never run, causing data gaps and unreliable reports. This can lead to wrong business decisions or broken systems. These features ensure pipelines stay complete and accurate, even after interruptions.
Where it fits
Learners should first understand Airflow basics like DAGs, scheduling, and task execution. After mastering catchup and backfill, they can learn advanced topics like SLA monitoring, retries, and dynamic DAG generation.