0
0
AirflowHow-ToBeginner · 3 min read

How to Schedule a DAG in Apache Airflow: Syntax and Examples

In Apache Airflow, you schedule a DAG by setting the schedule_interval parameter in the DAG definition using a cron expression or preset strings like @daily. This tells Airflow when to run the DAG automatically.
📐

Syntax

The schedule_interval parameter in the DAG constructor defines when the DAG runs. It accepts:

  • Cron expressions like "0 12 * * *" to run at noon daily.
  • Preset strings like @hourly, @daily, @weekly.
  • None to disable automatic scheduling.

Example: schedule_interval='0 6 * * *' runs the DAG every day at 6 AM.

python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG(
    dag_id='example_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='0 6 * * *'  # Runs daily at 6 AM
)

task = DummyOperator(task_id='dummy_task', dag=dag)
💻

Example

This example shows a DAG scheduled to run every day at midnight using the preset @daily. It contains a simple dummy task.

python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

dag = DAG(
    dag_id='daily_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily'  # Runs once every day at midnight
)

task = DummyOperator(task_id='start', dag=dag)
Output
No direct output; Airflow scheduler triggers the DAG daily at midnight.
⚠️

Common Pitfalls

  • Wrong start_date: Setting start_date in the future delays DAG runs until that date.
  • Using schedule_interval=None: disables scheduling; DAG runs only when triggered manually.
  • Misunderstanding cron syntax: Incorrect cron expressions cause unexpected schedules.
  • Timezone issues: Airflow uses UTC by default; local time differences can confuse scheduling.
python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime, timedelta

# Wrong: start_date in the future delays runs
wrong_dag = DAG(
    dag_id='wrong_start_date',
    start_date=datetime(2099, 1, 1),  # Far future date
    schedule_interval='@daily'
)

# Correct: start_date in the past or present
correct_dag = DAG(
    dag_id='correct_start_date',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily'
)

# Wrong: disables scheduling
manual_dag = DAG(
    dag_id='manual_trigger_only',
    start_date=datetime(2024, 1, 1),
    schedule_interval=None  # No automatic runs
)

# Correct: use valid cron or presets for scheduling
📊

Quick Reference

Common schedule_interval presets and their meanings:

PresetMeaning
@onceRun once immediately after start_date
@hourlyRun every hour
@dailyRun once a day at midnight UTC
@weeklyRun once a week on Sunday at midnight UTC
@monthlyRun once a month on the first day at midnight UTC
cron expressionCustom schedule using standard cron syntax

Key Takeaways

Set the schedule_interval parameter in your DAG to control when it runs automatically.
Use cron expressions or preset strings like @daily for easy scheduling.
Ensure start_date is in the past or present to avoid delayed runs.
Setting schedule_interval=None disables automatic scheduling.
Remember Airflow uses UTC timezone by default for scheduling.