0
0
Apache Airflowdevops~20 mins

Timetables for complex schedules in Apache Airflow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Airflow Timetable Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
💻 Command Output
intermediate
2:00remaining
Output of a complex Airflow timetable with multiple intervals
Given this Airflow timetable code, what is the next scheduled run date after 2024-06-01 00:00 UTC?
Apache Airflow
from airflow.timetables.interval import CronDataInterval
from airflow.timetables.base import Timetable
from pendulum import datetime

class CustomTimetable(Timetable):
    def infer_manual_data_interval(self, run_after: datetime):
        return None

    def next_dagrun_info(self, last_automated_data_interval, restriction):
        # Schedule every Monday and Friday at 10:00 UTC
        from airflow.timetables.interval import CronDataInterval
        from pendulum import datetime

        if last_automated_data_interval is None:
            next_start = datetime(2024, 6, 3, 10, 0, 0, tz='UTC')  # Monday
        else:
            last_start = last_automated_data_interval.start
            if last_start.weekday() == 0:  # Monday
                next_start = last_start.add(days=4)  # Friday
            else:
                next_start = last_start.add(days=3)  # Next Monday
        return CronDataInterval(next_start, next_start.add(hours=1))

custom_tt = CustomTimetable()
last_interval = None
restriction = None
next_run = custom_tt.next_dagrun_info(last_interval, restriction).start
print(next_run)
A2024-06-01T10:00:00+00:00
B2024-06-03T10:00:00+00:00
C2024-06-05T10:00:00+00:00
D2024-06-07T10:00:00+00:00
Attempts:
2 left
💡 Hint
Think about the first Monday after June 1st, 2024.
🧠 Conceptual
intermediate
1:30remaining
Understanding Airflow Timetable restrictions
Which statement best describes the role of the 'restriction' parameter in Airflow's Timetable.next_dagrun_info method?
AIt defines the timezone for the scheduled runs.
BIt specifies the DAG owner responsible for the run.
CIt sets the maximum number of concurrent runs allowed.
DIt limits the earliest and latest possible run dates to avoid scheduling outside a window.
Attempts:
2 left
💡 Hint
Think about how Airflow avoids scheduling runs too early or too late.
🔀 Workflow
advanced
2:30remaining
Configuring a DAG with a custom timetable for multiple schedules
You want to create an Airflow DAG that runs every weekday at 9:00 AM and also on the 1st of every month at 12:00 PM. Which approach correctly implements this using a custom timetable?
ACreate a custom Timetable class that returns next run dates for weekdays at 9:00 AM and 1st of month at 12:00 PM, alternating between them.
BUse two separate DAGs with standard cron schedules for each timing.
CSet schedule_interval to '0 9 * * 1-5,0 12 1 * *' in the DAG definition.
DUse a single cron expression '0 9,12 1-31 * 1-5' to cover both schedules.
Attempts:
2 left
💡 Hint
Think about how to combine two different schedules in one DAG timetable.
Troubleshoot
advanced
2:30remaining
Diagnosing unexpected DAG run times with a custom timetable
A DAG with a custom timetable runs at unexpected times, including weekends, despite the timetable code specifying only weekdays. What is the most likely cause?
AThe Airflow scheduler is ignoring the timetable due to a configuration error.
BThe DAG's schedule_interval is overriding the custom timetable.
CThe timetable's next_dagrun_info method does not correctly check the last run date and returns incorrect next run dates.
DThe timezone setting in Airflow is set to a non-UTC zone causing date shifts.
Attempts:
2 left
💡 Hint
Check how the timetable calculates the next run date based on the last run.
Best Practice
expert
3:00remaining
Best practice for implementing complex schedules in Airflow
What is the best practice when implementing a complex schedule that requires multiple non-overlapping intervals in Airflow?
AImplement a custom Timetable class that explicitly calculates and returns the next run date considering all intervals.
BUse multiple DAGs each with a simple cron schedule and combine their outputs downstream.
CUse a single cron expression with multiple comma-separated values to cover all intervals.
DManually trigger DAG runs at required times instead of scheduling.
Attempts:
2 left
💡 Hint
Consider maintainability and correctness for complex schedules.