What Are SubDAGs in Airflow and How They Work
SubDAG is a DAG (Directed Acyclic Graph) nested inside another DAG to organize complex workflows into smaller, manageable parts. It allows you to group related tasks together and run them as a single unit within the main DAG.How It Works
Think of a SubDAG as a smaller project inside a bigger project. Just like you might break a big task into smaller steps to make it easier to handle, a SubDAG breaks a large Airflow workflow into smaller DAGs. Each SubDAG runs independently but is controlled by the main DAG.
This helps keep your workflows clean and organized. The main DAG triggers the SubDAG as if it were a single task, but inside the SubDAG, you can have many tasks running in order or in parallel. This is like having a team leader (main DAG) who manages several teams (SubDAGs), each doing their own detailed work.
Example
from airflow import DAG from airflow.operators.dummy import DummyOperator from airflow.operators.subdag import SubDagOperator from datetime import datetime def subdag(parent_dag_name, child_dag_name, args): subdag = DAG( dag_id=f'{parent_dag_name}.{child_dag_name}', default_args=args, schedule_interval=None, ) start = DummyOperator(task_id='start', dag=subdag) end = DummyOperator(task_id='end', dag=subdag) start >> end return subdag args = {'start_date': datetime(2024, 1, 1)} main_dag = DAG('main_dag', default_args=args, schedule_interval='@daily') subdag_task = SubDagOperator( task_id='subdag_task', subdag=subdag('main_dag', 'subdag_task', args), dag=main_dag, ) start_main = DummyOperator(task_id='start_main', dag=main_dag) end_main = DummyOperator(task_id='end_main', dag=main_dag) start_main >> subdag_task >> end_main
When to Use
Use SubDAGs when you have a large workflow that can be logically divided into smaller parts. This helps improve readability and maintenance. For example, if you have a data pipeline with multiple stages like extraction, transformation, and loading, you can create a SubDAG for each stage.
SubDAGs are also useful when you want to reuse a group of tasks in multiple workflows or when you want to run a set of tasks independently but still keep them connected to the main workflow.
Key Points
- SubDAGs are DAGs inside a main DAG to organize complex workflows.
- They run as a single task in the main DAG but contain multiple tasks inside.
- Use SubDAGs to improve workflow clarity and reuse task groups.
- SubDAGs should have their own schedule_interval set to None to avoid conflicts.