0
0
Apache Airflowdevops~7 mins

DAG versioning strategies in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you update workflows in Airflow, you need a way to keep track of changes and run the right version. DAG versioning strategies help you manage different versions of your workflows safely and clearly.
When you want to test a new version of a workflow without stopping the current one
When you need to keep old workflow versions for audit or rerun purposes
When multiple teams update the same DAG and you want to avoid conflicts
When you want to deploy workflow changes gradually to avoid breaking production
When you want to rollback to a previous workflow version quickly if errors occur
Config File - example_dag_v1.py
example_dag_v1.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    dag_id='example_dag_v1',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    catchup=False
) as dag:
    task1 = BashOperator(
        task_id='print_date',
        bash_command='date'
    )

This DAG file defines version 1 of a workflow named example_dag_v1. The dag_id includes the version number to distinguish it from other versions. This way, Airflow treats each version as a separate workflow.

The start_date and schedule_interval control when the DAG runs. The catchup=False flag prevents running old missed schedules when the DAG is first added.

Commands
This command lists all DAGs currently registered in Airflow. It helps you verify that your versioned DAGs are recognized separately.
Terminal
airflow dags list
Expected OutputExpected
example_dag_v1 example_dag_v2
This triggers a run of the version 1 DAG manually to test or run it immediately.
Terminal
airflow dags trigger example_dag_v1
Expected OutputExpected
Created <DagRun example_dag_v1 @ 2024-06-01 12:00:00: manual__2024-06-01T12:00:00+00:00, externally triggered: True>
This triggers a run of the version 2 DAG, showing how you can run different versions independently.
Terminal
airflow dags trigger example_dag_v2
Expected OutputExpected
Created <DagRun example_dag_v2 @ 2024-06-01 12:01:00: manual__2024-06-01T12:01:00+00:00, externally triggered: True>
Pauses the version 1 DAG to stop it from running on schedule, useful when you want to switch to a newer version.
Terminal
airflow dags pause example_dag_v1
Expected OutputExpected
Dag example_dag_v1 is paused
Unpauses the version 2 DAG to allow it to run on schedule, enabling the new version to take over.
Terminal
airflow dags unpause example_dag_v2
Expected OutputExpected
Dag example_dag_v2 is unpaused
Key Concept

If you remember nothing else from DAG versioning, remember: use unique dag_id names with version info to run and manage multiple workflow versions safely.

Common Mistakes
Using the same dag_id for different versions of a DAG
Airflow will overwrite the old DAG with the new one, losing the ability to run or track the old version separately.
Include version numbers or dates in the dag_id to keep each version distinct, like example_dag_v1 and example_dag_v2.
Not pausing old DAG versions when deploying new ones
Both versions may run simultaneously, causing duplicate work or conflicts.
Pause old DAG versions using airflow dags pause before unpausing the new version.
Changing DAG code without updating the dag_id
Airflow may not detect the change properly, leading to confusion about which version is running.
Always update the dag_id when making significant changes to create a new version.
Summary
Use unique dag_id names with version info to keep DAG versions separate.
Trigger and manage each DAG version independently using airflow CLI commands.
Pause old versions before enabling new ones to avoid conflicts.