0
0
Apache Airflowdevops~20 mins

DAG concept (Directed Acyclic Graph) in Apache Airflow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
DAG Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Understanding DAG Structure in Airflow

Which statement best describes the structure of a DAG in Apache Airflow?

AA DAG is a collection of tasks with dependencies that form a cycle allowing repeated execution paths.
BA DAG is a linear sequence of tasks executed one after another without any branching.
CA DAG is a set of tasks connected in a way that no cycles exist, ensuring a clear start and end.
DA DAG is a random graph where tasks can depend on each other in any order.
Attempts:
2 left
💡 Hint

Think about what 'acyclic' means in the context of graph structures.

💻 Command Output
intermediate
1:30remaining
Output of DAG Task Execution Order

Given the following Airflow DAG snippet, what is the order of task execution?

Apache Airflow
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG('example_dag', start_date=datetime(2024, 1, 1))

t1 = DummyOperator(task_id='task1', dag=dag)
t2 = DummyOperator(task_id='task2', dag=dag)
t3 = DummyOperator(task_id='task3', dag=dag)

t1 >> [t2, t3]
Atask1 runs first, then task2 and task3 run in parallel.
Btask3 runs first, then task1 and task2 run in parallel.
Ctask2 runs first, then task1 and task3 run in parallel.
Dtask1, task2, and task3 run all at the same time.
Attempts:
2 left
💡 Hint

Look at the dependency operator '>>' and what it means for task order.

Configuration
advanced
2:00remaining
Identifying Invalid DAG Configuration

Which of the following DAG definitions will cause an error due to cycle detection in Airflow?

Apache Airflow
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG('cycle_dag', start_date=datetime(2024, 1, 1))

t1 = DummyOperator(task_id='task1', dag=dag)
t2 = DummyOperator(task_id='task2', dag=dag)
t3 = DummyOperator(task_id='task3', dag=dag)

# Dependencies to check
A[t1, t2] >> t3
Bt1 >> t2 >> t3
Ct1 >> [t2, t3]
Dt1 >> t2 >> t3 >> t1
Attempts:
2 left
💡 Hint

Check if any task depends on itself directly or indirectly.

Troubleshoot
advanced
2:00remaining
Troubleshooting a DAG with Missing Task Dependencies

You have a DAG where task3 runs before task2, but your code specifies t1 >> t2 >> t3. What is the most likely cause?

AThe DAG file is not saved, so Airflow is running an old version without dependencies.
BAirflow allows tasks to run in any order regardless of dependencies.
CThe start_date is set in the future, so tasks run out of order.
DThe tasks are defined without a DAG object.
Attempts:
2 left
💡 Hint

Think about how Airflow reads DAG files and updates task dependencies.

🔀 Workflow
expert
2:30remaining
Determining the Number of Tasks Executed in a Complex DAG

Consider a DAG with tasks: t1, t2, t3, t4, t5. The dependencies are:

  • t1 >> t2
  • t1 >> t3
  • t2 >> t4
  • t3 >> t4
  • t4 >> t5

If task t3 fails and depends_on_past=False, how many tasks will run successfully in the next DAG run?

A3 tasks will run successfully: t1, t2, and t5.
B2 tasks will run successfully: t1 and t2.
C4 tasks will run successfully: t1, t2, t3, and t4.
DOnly 1 task will run successfully: t1.
Attempts:
2 left
💡 Hint

Consider which tasks depend on the failed task and how failure affects downstream tasks.