0
0
AirflowConceptBeginner · 3 min read

What is dag_id in Airflow: Definition and Usage

dag_id in Airflow is a unique identifier for a Directed Acyclic Graph (DAG) that represents a workflow. It helps Airflow track, schedule, and manage each workflow separately by name.
⚙️

How It Works

Think of dag_id as the name tag for a workflow in Airflow. Each workflow you create is a DAG, which is a set of tasks connected in a way that they run in order without loops. The dag_id uniquely identifies this workflow so Airflow knows which tasks belong to which workflow.

When Airflow runs, it looks at all the DAG files and reads their dag_id to organize and schedule tasks. This is like sorting mail by recipient name so each letter goes to the right person. Without a unique dag_id, Airflow wouldn’t know how to separate one workflow from another.

💻

Example

This example shows how to set a dag_id in a simple Airflow DAG definition.

python
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2024, 1, 1),
}

dag = DAG(
    dag_id='example_dag',  # This is the unique identifier
    default_args=default_args,
    schedule_interval='@daily'
)

task1 = BashOperator(
    task_id='print_date',
    bash_command='date',
    dag=dag
)

task1
Output
No direct output; Airflow registers the DAG with ID 'example_dag' for scheduling and execution.
🎯

When to Use

You use dag_id whenever you define a new workflow in Airflow. It is essential to give each DAG a unique dag_id so Airflow can manage multiple workflows without confusion.

For example, if you have workflows for daily data processing and monthly report generation, each should have its own dag_id like daily_data_processing and monthly_report_generation. This helps you monitor, trigger, and troubleshoot workflows individually.

Key Points

  • dag_id uniquely identifies each workflow in Airflow.
  • It must be unique across all DAGs to avoid conflicts.
  • Used by Airflow to schedule, track, and manage workflows.
  • Set dag_id when creating a DAG object in your Python code.

Key Takeaways

dag_id is the unique name for each Airflow workflow (DAG).
Always assign a unique dag_id to avoid workflow conflicts.
dag_id helps Airflow organize and schedule tasks correctly.
Use descriptive dag_id names to easily identify workflows.
You define dag_id when creating the DAG in your Python script.