What is schedule_interval in Airflow: Definition and Usage
schedule_interval in Airflow defines how often a DAG runs, using a time-based schedule like cron expressions or presets. It controls the timing of task execution automatically based on the defined interval.How It Works
Think of schedule_interval as setting an alarm clock for your workflow. It tells Airflow when to start running your tasks repeatedly. For example, you can set it to run every hour, every day at midnight, or every Monday morning.
Airflow uses this interval to trigger the DAG runs automatically without manual intervention. You can use simple presets like @daily or detailed cron expressions like 0 12 * * * to specify exact times.
This scheduling helps automate repetitive jobs, so you don’t have to start them yourself each time.
Example
This example shows a DAG scheduled to run every day at midnight using schedule_interval='@daily'.
from airflow import DAG from airflow.operators.bash import BashOperator from datetime import datetime default_args = { 'start_date': datetime(2024, 1, 1), } dag = DAG( 'daily_example', default_args=default_args, schedule_interval='@daily', catchup=False ) task = BashOperator( task_id='print_date', bash_command='date', dag=dag )
When to Use
Use schedule_interval when you want your workflows to run automatically on a regular schedule. This is useful for tasks like data backups, report generation, or syncing data between systems.
For example, if you need to process sales data every morning, set schedule_interval='0 6 * * *' to run at 6 AM daily. If you want to run a task every 15 minutes, use schedule_interval='*/15 * * * *'.
It helps save time and reduces errors by automating repetitive jobs.
Key Points
- Defines how often a DAG runs automatically.
- Supports cron expressions and presets like
@hourly,@daily. - Helps automate repetitive workflows without manual triggers.
- Can be set to None for manual or externally triggered DAGs.
Key Takeaways
schedule_interval sets the automatic timing for DAG runs in Airflow.schedule_interval=None disables automatic scheduling.