0
0
Apache Airflowdevops~20 mins

Why DAG design determines pipeline reliability in Apache Airflow - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
DAG Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
How does task dependency affect pipeline reliability?

In Airflow, tasks are connected in a Directed Acyclic Graph (DAG). How does defining clear dependencies between tasks improve the reliability of a pipeline?

AIt ensures tasks run in the correct order, preventing data errors caused by premature execution.
BIt allows tasks to run in parallel without any order, speeding up the pipeline.
CIt automatically retries failed tasks without manual configuration.
DIt reduces the number of tasks needed by combining them into one.
Attempts:
2 left
💡 Hint

Think about what happens if a task runs before its input data is ready.

💻 Command Output
intermediate
1:30remaining
What is the output of this DAG run status command?

Given a DAG named data_pipeline, what will be the output of the following command if the DAG has one failed task?

Apache Airflow
airflow dags state data_pipeline 2024-06-01
Aqueued
Bsuccess
Crunning
Dfailed
Attempts:
2 left
💡 Hint

The command shows the overall state of the DAG run on the given date.

🔀 Workflow
advanced
2:30remaining
Identify the best DAG design to avoid cyclic dependencies

Which DAG design below avoids cyclic dependencies and ensures pipeline reliability?

Apache Airflow
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG('example_dag', start_date=datetime(2024, 6, 1))

t1 = DummyOperator(task_id='start', dag=dag)
t2 = DummyOperator(task_id='middle', dag=dag)
t3 = DummyOperator(task_id='end', dag=dag)

# Options show different ways to set dependencies
At1 >> t2; t2 >> t3; t3 >> t1
Bt1 >> t2 >> t3
Ct1.set_downstream(t2); t2.set_downstream(t1)
Dt3 >> t2 >> t1
Attempts:
2 left
💡 Hint

Remember, DAGs cannot have cycles. Which option forms a straight line without loops?

Troubleshoot
advanced
2:30remaining
Why does this DAG fail to trigger downstream tasks?

In this DAG, task process_data does not trigger send_report even though the dependency is set. What is the likely cause?

Apache Airflow
process_data = PythonOperator(task_id='process_data', python_callable=process_func, dag=dag)
send_report = PythonOperator(task_id='send_report', python_callable=report_func, dag=dag)

process_data >> send_report

# process_func returns None
Aprocess_func does not raise an exception, so send_report should trigger normally.
Bprocess_func returns None, which breaks the DAG and prevents send_report from running.
CThe dependency is set incorrectly; it should be send_report >> process_data.
DThe DAG is missing a start_date, so tasks do not trigger.
Attempts:
2 left
💡 Hint

Think about what happens when a PythonOperator's callable returns None.

Best Practice
expert
3:00remaining
Which DAG design practice improves pipeline reliability the most?

Choose the best practice that directly improves the reliability of an Airflow pipeline by DAG design.

AAvoid setting retries on tasks to prevent repeated failures.
BCombine all tasks into a single task to reduce complexity and speed up execution.
CUse clear, explicit task dependencies and avoid cycles to ensure predictable execution order.
DSet all tasks to run in parallel to minimize total pipeline runtime.
Attempts:
2 left
💡 Hint

Think about what makes a pipeline predictable and easy to maintain.