0
0
Apache Airflowdevops~5 mins

DAG performance tracking in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
Tracking the performance of your Airflow DAGs helps you understand how long tasks take and if any are failing. This lets you improve your workflows and fix problems quickly.
When you want to see how long each task in your workflow takes to run.
When you need to find out if any tasks are failing or retrying often.
When you want to monitor the overall health and efficiency of your data pipelines.
When you want to compare performance before and after changes to your DAGs.
When you want to alert your team if tasks take too long or fail.
Config File - my_dag.py
my_dag.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
from datetime import timedelta

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(
    'performance_tracking_example',
    default_args=default_args,
    description='A simple DAG to demonstrate performance tracking',
    schedule_interval='@daily',
    start_date=days_ago(2),
    catchup=False,
    tags=['example'],
)

task1 = BashOperator(
    task_id='print_date',
    bash_command='date',
    dag=dag,
)

task2 = BashOperator(
    task_id='sleep',
    bash_command='sleep 5',
    dag=dag,
)

task1 >> task2

This DAG runs two simple tasks: printing the date and sleeping for 5 seconds.

Airflow automatically tracks the start time, end time, duration, and status of each task run.

The default_args set retry behavior and owner info.

The DAG runs daily starting two days ago, without catching up missed runs.

Commands
List all DAGs currently available in Airflow to confirm your DAG is loaded.
Terminal
airflow dags list
Expected OutputExpected
dag_id | description | schedule_interval | file performance_tracking_example | A simple DAG to demonstrate performance tracking | @daily | /usr/local/airflow/dags/my_dag.py
Manually trigger the DAG to run now so we can track its performance.
Terminal
airflow dags trigger performance_tracking_example
Expected OutputExpected
Created <DagRun performance_tracking_example @ 2024-06-01 12:00:00+00:00: manual__2024-06-01T12:00:00+00:00, externally triggered: True>
List all tasks in the DAG to know what tasks we will track.
Terminal
airflow tasks list performance_tracking_example
Expected OutputExpected
print_date sleep
Check the state of the 'print_date' task for the run on 2024-06-01 to see if it succeeded.
Terminal
airflow tasks state performance_tracking_example print_date 2024-06-01
Expected OutputExpected
success
View the logs of the 'print_date' task to see its output and timing details.
Terminal
airflow tasks logs performance_tracking_example print_date 2024-06-01
Expected OutputExpected
[2024-06-01 12:00:01,000] {bash_operator.py:123} INFO - Running command: date Thu Jun 1 12:00:01 UTC 2024 [2024-06-01 12:00:02,000] {bash_operator.py:130} INFO - Command exited with return code 0
Key Concept

If you remember nothing else from DAG performance tracking, remember: Airflow automatically records task start, end, duration, and status, which you can view via CLI or UI to monitor your workflows.

Common Mistakes
Not triggering the DAG manually after adding it, so no runs exist to track.
Without a DAG run, there is no performance data to view or analyze.
Use 'airflow dags trigger <dag_id>' to start a run and generate performance data.
Checking task state or logs for the wrong execution date.
Airflow tracks tasks by execution date; wrong date means no data or errors.
Always specify the correct execution date when querying task states or logs.
Ignoring retries and failure logs when analyzing performance.
Retries and failures affect total runtime and reliability but may be missed if not checked.
Review retry counts and failure logs to get a full picture of task performance.
Summary
Create a DAG with simple tasks to generate performance data.
Trigger the DAG manually to start a run and collect metrics.
Use Airflow CLI commands to list DAGs, tasks, check task states, and view logs for performance insights.