How to Configure Apache Airflow: Setup and Configuration Guide
To configure
Airflow, edit the airflow.cfg file or set environment variables to customize settings like the executor, database, and scheduler. You can also configure connections and variables via the Airflow UI or CLI for managing external services.Syntax
The main configuration file for Airflow is airflow.cfg. It contains sections and key-value pairs to set options.
- [core]: Basic settings like
executor,sql_alchemy_conn(database URL), anddags_folder. - [scheduler]: Controls scheduler behavior like
job_heartbeat_sec. - [webserver]: Settings for the Airflow web UI, such as
web_server_port.
You can override these settings by exporting environment variables with the prefix AIRFLOW__SECTION__KEY, for example AIRFLOW__CORE__EXECUTOR=LocalExecutor.
ini
[core] executor = SequentialExecutor sql_alchemy_conn = sqlite:///airflow.db dags_folder = /path/to/dags [scheduler] job_heartbeat_sec = 5 [webserver] web_server_port = 8080
Example
This example shows how to configure Airflow to use the LocalExecutor and connect to a PostgreSQL database by editing airflow.cfg.
ini
[core] executor = LocalExecutor sql_alchemy_conn = postgresql+psycopg2://user:password@localhost:5432/airflow_db dags_folder = /usr/local/airflow/dags [scheduler] job_heartbeat_sec = 10 [webserver] web_server_port = 8080
Common Pitfalls
Common mistakes when configuring Airflow include:
- Using the default SQLite database in production, which does not support parallelism well.
- Not setting the
executorproperly, causing tasks to run sequentially instead of in parallel. - Forgetting to restart Airflow services after changing
airflow.cfg. - Incorrect database connection strings causing connection failures.
Always test your configuration changes in a development environment before production.
ini
Wrong (default executor): [core] executor = SequentialExecutor Right (for parallel tasks): [core] executor = LocalExecutor
Quick Reference
| Setting | Description | Example Value |
|---|---|---|
| executor | Defines how tasks run (sequential, parallel) | LocalExecutor |
| sql_alchemy_conn | Database connection string | postgresql+psycopg2://user:pass@host/db |
| dags_folder | Folder path where DAG files are stored | /usr/local/airflow/dags |
| web_server_port | Port for Airflow web UI | 8080 |
| job_heartbeat_sec | Scheduler heartbeat interval in seconds | 10 |
Key Takeaways
Edit airflow.cfg or use environment variables to configure Airflow settings.
Use a production-ready database like PostgreSQL instead of SQLite.
Set the executor to LocalExecutor or CeleryExecutor for parallel task execution.
Restart Airflow services after configuration changes to apply them.
Use the Airflow UI or CLI to manage connections and variables for external services.