In Apache Airflow, what does the execution date represent?
Think about what Airflow uses to identify the data period a DAG run is processing.
The execution date in Airflow is a logical timestamp representing the start of the data interval the DAG run is processing. It is not the actual time the task runs.
Given a DAG scheduled to run daily starting at 2024-06-01, if a DAG run starts at 2024-06-02 01:00 AM, what is the execution date shown in the Airflow UI for this run?
Remember that execution date points to the start of the data interval, not the actual run time.
The execution date for a daily DAG run is the start of the day it processes. Even if the run starts later, the execution date remains the previous day at midnight.
You run an Airflow backfill command for a DAG with daily schedule from 2024-05-28 to 2024-05-30. Which dates will be used as execution dates for the backfill runs?
Execution dates correspond to the logical schedule dates you specify for backfill.
Backfill runs use the execution dates you specify, which are the logical dates for the DAG runs. For daily schedules, these are the dates you pass to backfill.
You notice that a DAG run triggered manually shows an execution date of 2024-06-01 00:00:00, even though you triggered it on 2024-06-05. Why does this happen?
Think about how Airflow assigns execution dates for manual runs.
When manually triggering a DAG, Airflow uses the execution date you specify or defaults to the previous scheduled interval start, not the current date.
Which is the best practice for using the execution date inside an Airflow task to process data?
Think about how to define the data interval for processing in scheduled DAGs.
The execution date marks the start of the data interval. Adding the schedule interval gives the end of the interval, which is the best way to process data for that run.