0
0
Apache Airflowdevops~20 mins

Catchup and backfill behavior in Apache Airflow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Catchup Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What happens when catchup is set to True in Airflow?

In Airflow, if you set catchup=True for a DAG, what behavior should you expect when the DAG is first deployed?

AAirflow will only run the DAG for the current date and ignore past schedules.
BAirflow will run all past scheduled DAG runs from the start date until now to catch up.
CAirflow disables scheduling and requires manual trigger for all runs.
DAirflow skips all runs and marks them as success without execution.
Attempts:
2 left
💡 Hint

Think about what 'catching up' means in terms of scheduled runs.

💻 Command Output
intermediate
2:00remaining
Output of backfill command with catchup disabled

Given a DAG with catchup=False, what will be the output when running airflow dags backfill -s 2024-01-01 -e 2024-01-03 my_dag?

Apache Airflow
airflow dags backfill -s 2024-01-01 -e 2024-01-03 my_dag
ABackfill command fails with an error about catchup being disabled.
BNo DAG runs are executed because catchup is disabled.
COnly the DAG run for 2024-01-03 is executed.
DRuns DAG for 2024-01-01, 2024-01-02, and 2024-01-03 regardless of catchup setting.
Attempts:
2 left
💡 Hint

Remember that backfill manually triggers runs regardless of catchup.

Troubleshoot
advanced
2:00remaining
Why are past DAG runs not executing despite catchup=True?

You have a DAG with catchup=True and a start_date set to 10 days ago. However, no past DAG runs are executing. What is the most likely cause?

AThe Airflow scheduler is not running.
BThe DAG file has syntax errors preventing execution.
CThe DAG's <code>schedule_interval</code> is set to None, so no scheduled runs exist.
DThe DAG's <code>max_active_runs</code> is set to 0.
Attempts:
2 left
💡 Hint

Check if the DAG has a schedule defined.

🔀 Workflow
advanced
3:00remaining
Order of operations when enabling catchup on an existing DAG

Arrange the steps in the correct order to enable catchup on an existing DAG and ensure past runs are executed.

A3,1,2,4
B1,3,2,4
C3,2,1,4
D1,2,3,4
Attempts:
2 left
💡 Hint

Think about what must be set before enabling catchup and restarting scheduler.

Best Practice
expert
3:00remaining
Best practice to avoid unintended backfill when enabling catchup

You want to enable catchup=True on a DAG but avoid running all past missed DAG runs immediately. What is the best practice to achieve this?

ASet the DAG's <code>start_date</code> to the current date before enabling catchup.
BManually trigger a backfill for all past dates before enabling catchup.
CSet <code>max_active_runs</code> to 1 to limit catchup runs.
DDisable the scheduler temporarily while enabling catchup.
Attempts:
2 left
💡 Hint

Consider how Airflow decides which past runs to execute.