0
0
AirflowConceptBeginner · 3 min read

What Are SubDAGs in Airflow and How They Work

In Apache Airflow, a SubDAG is a DAG (Directed Acyclic Graph) nested inside another DAG to organize complex workflows into smaller, manageable parts. It allows you to group related tasks together and run them as a single unit within the main DAG.
⚙️

How It Works

Think of a SubDAG as a smaller project inside a bigger project. Just like you might break a big task into smaller steps to make it easier to handle, a SubDAG breaks a large Airflow workflow into smaller DAGs. Each SubDAG runs independently but is controlled by the main DAG.

This helps keep your workflows clean and organized. The main DAG triggers the SubDAG as if it were a single task, but inside the SubDAG, you can have many tasks running in order or in parallel. This is like having a team leader (main DAG) who manages several teams (SubDAGs), each doing their own detailed work.

💻

Example

This example shows how to create a main DAG with a SubDAG that contains two simple tasks.
python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.subdag import SubDagOperator
from datetime import datetime

def subdag(parent_dag_name, child_dag_name, args):
    subdag = DAG(
        dag_id=f'{parent_dag_name}.{child_dag_name}',
        default_args=args,
        schedule_interval=None,
    )
    start = DummyOperator(task_id='start', dag=subdag)
    end = DummyOperator(task_id='end', dag=subdag)
    start >> end
    return subdag

args = {'start_date': datetime(2024, 1, 1)}

main_dag = DAG('main_dag', default_args=args, schedule_interval='@daily')

subdag_task = SubDagOperator(
    task_id='subdag_task',
    subdag=subdag('main_dag', 'subdag_task', args),
    dag=main_dag,
)

start_main = DummyOperator(task_id='start_main', dag=main_dag)
end_main = DummyOperator(task_id='end_main', dag=main_dag)

start_main >> subdag_task >> end_main
Output
No direct output; the main DAG 'main_dag' runs and triggers the 'subdag_task' SubDAG which executes its tasks 'start' and 'end' in order.
🎯

When to Use

Use SubDAGs when you have a large workflow that can be logically divided into smaller parts. This helps improve readability and maintenance. For example, if you have a data pipeline with multiple stages like extraction, transformation, and loading, you can create a SubDAG for each stage.

SubDAGs are also useful when you want to reuse a group of tasks in multiple workflows or when you want to run a set of tasks independently but still keep them connected to the main workflow.

Key Points

  • SubDAGs are DAGs inside a main DAG to organize complex workflows.
  • They run as a single task in the main DAG but contain multiple tasks inside.
  • Use SubDAGs to improve workflow clarity and reuse task groups.
  • SubDAGs should have their own schedule_interval set to None to avoid conflicts.

Key Takeaways

SubDAGs let you break big Airflow workflows into smaller, manageable DAGs.
They run as one task in the main DAG but contain multiple tasks inside.
Use SubDAGs to organize, reuse, and simplify complex workflows.
Always set SubDAG schedule_interval to None to avoid scheduling conflicts.