0
0
AirflowConceptBeginner · 3 min read

What is DAG in Airflow: Definition and Usage Explained

In Apache Airflow, a DAG (Directed Acyclic Graph) is a collection of tasks organized with dependencies to define a workflow. It represents the order and rules for running tasks automatically and reliably.
⚙️

How It Works

A DAG in Airflow is like a recipe that tells the system what steps to follow and in what order. Imagine you are baking a cake: you need to mix ingredients before baking, and then decorate after baking. Similarly, a DAG defines tasks and their dependencies so Airflow knows which task to run first and which ones depend on others.

Each task in a DAG is a single step, and the DAG ensures tasks run only when their prerequisites are complete. The 'Directed' part means tasks flow in one direction, and 'Acyclic' means there are no loops, so the workflow doesn’t get stuck repeating steps forever.

💻

Example

This example shows a simple DAG with two tasks where the second task runs after the first finishes.
python
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def task_one():
    print('Task one is running')

def task_two():
    print('Task two is running')

defining_dag = DAG(
    'simple_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily'
)

t1 = PythonOperator(
    task_id='task_one',
    python_callable=task_one,
    dag=defining_dag
)

t2 = PythonOperator(
    task_id='task_two',
    python_callable=task_two,
    dag=defining_dag
)

t1 >> t2
Output
Task one is running Task two is running
🎯

When to Use

Use a DAG in Airflow when you need to automate and manage workflows that have multiple steps with dependencies. For example, data pipelines that extract data, transform it, and then load it into a database are perfect for DAGs.

DAGs help ensure tasks run in the right order, handle retries if something fails, and provide clear visibility into workflow status. They are ideal for scheduling jobs that must run regularly, like daily reports or backups.

Key Points

  • A DAG defines the workflow structure and task order in Airflow.
  • Tasks run based on dependencies, ensuring correct sequence.
  • DAGs prevent loops to avoid infinite task execution.
  • They are used to automate complex workflows reliably.

Key Takeaways

A DAG in Airflow is a set of tasks with defined order and dependencies.
It ensures tasks run in sequence without loops, automating workflows.
Use DAGs to manage and schedule complex, multi-step processes.
DAGs improve reliability and visibility of automated jobs.