0
0
Apache Airflowdevops~10 mins

DAG concept (Directed Acyclic Graph) in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - DAG concept (Directed Acyclic Graph)
Start DAG Definition
Define Tasks (Nodes)
Set Dependencies (Edges)
Check for Cycles?
YesError: Cycle Detected
No
Schedule DAG Run
Execute Tasks in Order
DAG Run Complete
This flow shows how a DAG is defined with tasks and dependencies, checked for cycles, then scheduled and executed in order.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG('example_dag', start_date=datetime(2024,1,1))

t1 = DummyOperator(task_id='task1', dag=dag)
t2 = DummyOperator(task_id='task2', dag=dag)
t3 = DummyOperator(task_id='task3', dag=dag)

t1 >> t2 >> t3
Defines a DAG with three tasks where task1 runs before task2, which runs before task3.
Process Table
StepActionDAG StateDependenciesNotes
1Create DAG objectDAG 'example_dag' createdNo dependencies yetDAG initialized with start_date 2024-01-01
2Create task1Tasks: task1NoneTask1 added to DAG
3Create task2Tasks: task1, task2NoneTask2 added to DAG
4Create task3Tasks: task1, task2, task3NoneTask3 added to DAG
5Set dependency: task1 >> task2Tasks: task1, task2, task3task1 -> task2task2 depends on task1
6Set dependency: task2 >> task3Tasks: task1, task2, task3task1 -> task2 -> task3task3 depends on task2
7Check for cyclesNo cycles detectedtask1 -> task2 -> task3DAG is acyclic
8Schedule DAG runDAG scheduledtask1 -> task2 -> task3DAG ready to execute
9Execute task1task1 completedtask1 -> task2 -> task3First task runs
10Execute task2task2 completedtask1 -> task2 -> task3Second task runs after task1
11Execute task3task3 completedtask1 -> task2 -> task3Final task runs after task2
12DAG run completeAll tasks completedtask1 -> task2 -> task3DAG execution finished
💡 All tasks executed in order, no cycles found, DAG run completed successfully
Status Tracker
VariableStartAfter Step 2After Step 3After Step 4After Step 5After Step 6Final
DAG ObjectNoneCreatedCreatedCreatedCreatedCreatedCreated
TasksNonetask1task1, task2task1, task2, task3task1, task2, task3task1, task2, task3task1, task2, task3
DependenciesNoneNoneNoneNonetask1 -> task2task1 -> task2 -> task3task1 -> task2 -> task3
Key Moments - 3 Insights
Why can't a DAG have cycles in Airflow?
Because cycles would cause tasks to wait on each other forever, preventing the DAG from completing. The execution_table step 7 shows the cycle check that ensures the DAG is acyclic.
What does the '>>' operator do between tasks?
It sets the order of execution, making the left task a dependency of the right task. See execution_table steps 5 and 6 where dependencies are set using '>>'.
Can tasks run in parallel in a DAG?
Yes, tasks without dependencies between them can run at the same time. In this example, tasks are chained, so they run sequentially, but if tasks had no dependencies, they could run in parallel.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the dependency state after step 5?
Atask2 -> task3
Btask1 -> task2
Ctask1 -> task3
DNo dependencies
💡 Hint
Check the 'Dependencies' column in execution_table row with Step 5
At which step does the DAG confirm it has no cycles?
AStep 7
BStep 8
CStep 6
DStep 9
💡 Hint
Look for the step labeled 'Check for cycles' in the execution_table
If task2 had no dependency on task1, how would the execution order change?
Atask3 would run before task2
Btask2 would run before task1
Ctask1 and task2 could run in parallel
DNo change, tasks run sequentially
💡 Hint
Refer to key_moments about parallel execution and dependencies
Concept Snapshot
DAG (Directed Acyclic Graph) in Airflow defines tasks and their order.
Tasks are nodes; dependencies are edges.
Cycles are not allowed to prevent infinite waits.
Use '>>' to set task order (e.g., task1 >> task2).
Airflow schedules and runs tasks respecting dependencies.
Tasks without dependencies can run in parallel.
Full Transcript
A DAG in Airflow is a set of tasks connected by dependencies that form a directed graph with no cycles. We start by creating a DAG object, then define tasks as nodes. Dependencies between tasks are set using operators like '>>' to specify order. Airflow checks the DAG to ensure no cycles exist, because cycles would cause tasks to wait forever. Once validated, the DAG is scheduled and tasks execute in order, respecting dependencies. Tasks without dependencies can run at the same time. This example shows three tasks where task1 runs first, then task2, then task3, completing the DAG run successfully.