In Airflow, tasks are connected in a Directed Acyclic Graph (DAG). How does defining clear dependencies between tasks improve the reliability of a pipeline?
Think about what happens if a task runs before its input data is ready.
Defining dependencies ensures tasks execute only after their required inputs are ready, avoiding errors and improving pipeline reliability.
Given a DAG named data_pipeline, what will be the output of the following command if the DAG has one failed task?
airflow dags state data_pipeline 2024-06-01
The command shows the overall state of the DAG run on the given date.
If any task in the DAG run fails, the overall DAG run state is marked as 'failed'.
Which DAG design below avoids cyclic dependencies and ensures pipeline reliability?
from airflow import DAG from airflow.operators.dummy import DummyOperator from datetime import datetime dag = DAG('example_dag', start_date=datetime(2024, 6, 1)) t1 = DummyOperator(task_id='start', dag=dag) t2 = DummyOperator(task_id='middle', dag=dag) t3 = DummyOperator(task_id='end', dag=dag) # Options show different ways to set dependencies
Remember, DAGs cannot have cycles. Which option forms a straight line without loops?
Option B creates a linear flow from start to end without cycles. Options A and B create cycles, which Airflow does not allow. Option B reverses the logical order.
In this DAG, task process_data does not trigger send_report even though the dependency is set. What is the likely cause?
process_data = PythonOperator(task_id='process_data', python_callable=process_func, dag=dag) send_report = PythonOperator(task_id='send_report', python_callable=report_func, dag=dag) process_data >> send_report # process_func returns None
Think about what happens when a PythonOperator's callable returns None.
Returning None from a PythonOperator is normal and does not affect downstream tasks. The dependency is correct, and start_date is assumed set. So the problem lies elsewhere if send_report does not trigger.
Choose the best practice that directly improves the reliability of an Airflow pipeline by DAG design.
Think about what makes a pipeline predictable and easy to maintain.
Explicit dependencies and acyclic design prevent execution errors and make pipelines reliable. Combining tasks or running all in parallel can cause errors or data issues. Disabling retries reduces fault tolerance.