Imagine you have several tasks to run in a specific order to process data. Why is orchestration important in this scenario?
Think about how tasks depend on each other and need to run in a certain sequence.
Orchestration tools like Airflow manage the order and dependencies of tasks in data pipelines, ensuring tasks run only when their prerequisites are complete.
Given a DAG with tasks A → B → C, what is the order of task execution shown in the Airflow logs?
Task A started Task A completed Task B started Task B completed Task C started Task C completed
Look at the order tasks start and complete in the logs.
The logs show tasks starting and completing in the order A, then B, then C, matching the DAG dependencies.
Which Airflow DAG code snippet correctly sets task B to run after task A?
In Airflow, the '>>' operator sets downstream dependencies.
The '>>' operator means the task on the left runs before the task on the right, so 'task_a >> task_b' means B runs after A.
You scheduled a task in Airflow, but it never runs. What is the most likely reason?
Think about task dependencies and what controls task execution.
In Airflow, a task will not run if its upstream tasks have not finished successfully, as dependencies must be met first.
What is the best practice to handle a task failure in an Airflow data pipeline to avoid blocking the entire workflow?
Consider how to automatically recover and inform the team about issues.
Setting retries with delay helps recover from temporary issues, and alerting ensures the team knows about failures to act if needed.