0
0
AirflowHow-ToBeginner ยท 4 min read

How to Configure Apache Airflow: Setup and Configuration Guide

To configure Airflow, edit the airflow.cfg file or set environment variables to customize settings like the executor, database, and scheduler. You can also configure connections and variables via the Airflow UI or CLI for managing external services.
๐Ÿ“

Syntax

The main configuration file for Airflow is airflow.cfg. It contains sections and key-value pairs to set options.

  • [core]: Basic settings like executor, sql_alchemy_conn (database URL), and dags_folder.
  • [scheduler]: Controls scheduler behavior like job_heartbeat_sec.
  • [webserver]: Settings for the Airflow web UI, such as web_server_port.

You can override these settings by exporting environment variables with the prefix AIRFLOW__SECTION__KEY, for example AIRFLOW__CORE__EXECUTOR=LocalExecutor.

ini
[core]
executor = SequentialExecutor
sql_alchemy_conn = sqlite:///airflow.db
dags_folder = /path/to/dags

[scheduler]
job_heartbeat_sec = 5

[webserver]
web_server_port = 8080
๐Ÿ’ป

Example

This example shows how to configure Airflow to use the LocalExecutor and connect to a PostgreSQL database by editing airflow.cfg.

ini
[core]
executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://user:password@localhost:5432/airflow_db
dags_folder = /usr/local/airflow/dags

[scheduler]
job_heartbeat_sec = 10

[webserver]
web_server_port = 8080
โš ๏ธ

Common Pitfalls

Common mistakes when configuring Airflow include:

  • Using the default SQLite database in production, which does not support parallelism well.
  • Not setting the executor properly, causing tasks to run sequentially instead of in parallel.
  • Forgetting to restart Airflow services after changing airflow.cfg.
  • Incorrect database connection strings causing connection failures.

Always test your configuration changes in a development environment before production.

ini
Wrong (default executor):
[core]
executor = SequentialExecutor

Right (for parallel tasks):
[core]
executor = LocalExecutor
๐Ÿ“Š

Quick Reference

SettingDescriptionExample Value
executorDefines how tasks run (sequential, parallel)LocalExecutor
sql_alchemy_connDatabase connection stringpostgresql+psycopg2://user:pass@host/db
dags_folderFolder path where DAG files are stored/usr/local/airflow/dags
web_server_portPort for Airflow web UI8080
job_heartbeat_secScheduler heartbeat interval in seconds10
โœ…

Key Takeaways

Edit airflow.cfg or use environment variables to configure Airflow settings.
Use a production-ready database like PostgreSQL instead of SQLite.
Set the executor to LocalExecutor or CeleryExecutor for parallel task execution.
Restart Airflow services after configuration changes to apply them.
Use the Airflow UI or CLI to manage connections and variables for external services.