0
0
Apache Airflowdevops~5 mins

Default args and DAG parameters in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you create workflows in Airflow, you often want to set common settings for all tasks, like retries or start dates. Default arguments let you set these once and reuse them. DAG parameters help you define the workflow's schedule and behavior.
When you want all tasks in a workflow to retry on failure without setting retries for each task.
When you need to set a common start date for all tasks in a workflow.
When you want to define the schedule interval for running your workflow automatically.
When you want to control how many past runs Airflow keeps track of for your workflow.
When you want to pass parameters to your workflow to customize its behavior.
Config File - my_dag.py
my_dag.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2024, 6, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(
    'example_default_args_dag',
    default_args=default_args,
    description='A simple DAG with default args',
    schedule_interval=timedelta(days=1),
    catchup=False,
    max_active_runs=1
)

t1 = BashOperator(
    task_id='print_date',
    bash_command='date',
    dag=dag
)

t2 = BashOperator(
    task_id='sleep',
    bash_command='sleep 5',
    dag=dag
)

t1 >> t2

default_args: Sets common task parameters like owner, retries, and start date.

DAG: Defines the workflow with a name, default args, schedule, and behavior controls.

BashOperator tasks: Simple tasks that run shell commands, inheriting default args.

Commands
Lists all DAGs currently available in Airflow to verify the new DAG is recognized.
Terminal
airflow dags list
Expected OutputExpected
example_default_args_dag
Triggers the DAG manually to run the workflow immediately.
Terminal
airflow dags trigger example_default_args_dag
Expected OutputExpected
Created <DagRun example_default_args_dag @ 2024-06-01T00:00:00+00:00: manual__2024-06-01T00:00:00+00:00, externally triggered: True>
Lists all tasks in the DAG to confirm the tasks defined with default args are present.
Terminal
airflow tasks list example_default_args_dag
Expected OutputExpected
print_date sleep
Runs the 'print_date' task for the given date without scheduling, useful for testing task behavior with default args.
Terminal
airflow tasks test example_default_args_dag print_date 2024-06-01
Expected OutputExpected
[2024-06-01 00:00:00,000] {bash.py:123} INFO - Running command: date [2024-06-01 00:00:00,100] {bash.py:130} INFO - Output: Wed Jun 1 00:00:00 UTC 2024 [2024-06-01 00:00:00,200] {taskinstance.py:1234} INFO - Task succeeded
Key Concept

If you remember nothing else from this pattern, remember: default_args let you set common task settings once to keep your DAG clean and consistent.

Common Mistakes
Not setting a start_date in default_args or setting it to a future date.
Airflow will not run the DAG because it doesn't know when to start or waits for the start date to arrive.
Always set start_date to a past or current date to enable DAG scheduling.
Defining tasks without passing the DAG object or default_args.
Tasks won't be linked to the DAG and may not inherit default settings, causing errors or unexpected behavior.
Always pass the DAG object when creating tasks to ensure proper linkage and inheritance.
Setting retries to 0 or forgetting retry_delay in default_args.
Tasks won't retry on failure, which may cause workflow failures without attempts to recover.
Set retries to at least 1 and define retry_delay to control retry timing.
Summary
Define default_args to set common task parameters like retries and start date.
Create a DAG with default_args and schedule_interval to control workflow timing.
Define tasks linked to the DAG to inherit default settings and run commands.
Use airflow CLI commands to list, trigger, and test DAGs and tasks.