0
0
Apache Airflowdevops~10 mins

High availability configuration in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
High availability configuration in Airflow ensures that the workflow system keeps running smoothly even if one part fails. It helps avoid downtime by having backup components ready to take over automatically.
When you want your Airflow scheduler to keep working even if one scheduler instance crashes
When you need your Airflow webserver to be accessible without interruption during maintenance
When you want to prevent losing running tasks if a worker node goes down
When you run Airflow in a production environment with many users and critical workflows
When you want to scale Airflow components across multiple machines for reliability
Config File - airflow.cfg
airflow.cfg
[core]
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@postgres/airflow
load_examples = False

[webserver]
web_server_port = 8080
web_server_worker_timeout = 120

[scheduler]
max_threads = 2
scheduler_heartbeat_sec = 5
num_runs = -1

[celery]
broker_url = redis://redis:6379/0
result_backend = db+postgresql://airflow:airflow@postgres/airflow
worker_concurrency = 4

[logging]
remote_logging = False

[metrics]
enable = True

[database]
sql_alchemy_pool_size = 5
sql_alchemy_max_overflow = 10

[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend

[ha]
enable = True

This airflow.cfg file configures Airflow for high availability:

  • executor = CeleryExecutor: Runs tasks distributed across multiple worker nodes.
  • sql_alchemy_conn: Connects to a PostgreSQL database shared by all components.
  • broker_url and result_backend: Use Redis and PostgreSQL for messaging and task results.
  • scheduler settings: Allow multiple scheduler threads and continuous running.
  • ha.enable = True: Enables high availability features in Airflow.

This setup allows multiple schedulers and workers to run simultaneously, so if one fails, others continue working.

Commands
Initializes the Airflow metadata database. This sets up the tables Airflow needs to track workflows and tasks.
Terminal
airflow db init
Expected OutputExpected
INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade head INFO [alembic.runtime.migration] Upgrade done.
Creates an admin user to log into the Airflow web interface.
Terminal
airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password admin123
Expected OutputExpected
Created user admin
--username - Set the login username
--role - Assign user role for permissions
Starts the Airflow scheduler in the background. Multiple schedulers can run for high availability.
Terminal
airflow scheduler &
Expected OutputExpected
No output (command runs silently)
Starts a Celery worker in the background to execute tasks. Multiple workers improve reliability and scalability.
Terminal
airflow celery worker &
Expected OutputExpected
No output (command runs silently)
Starts the Airflow webserver on port 8080 in the background, providing the user interface.
Terminal
airflow webserver -p 8080 &
Expected OutputExpected
[2024-06-01 12:00:00,000] {webserver} INFO - Starting web server on port 8080
-p - Specify the port number
Checks the health of the Airflow webserver and scheduler to confirm high availability is active.
Terminal
curl -s http://localhost:8080/health | jq .
Expected OutputExpected
{ "metadatabase": { "status": "healthy" }, "scheduler": { "status": "healthy" }, "webserver": { "status": "healthy" } }
Key Concept

If you remember nothing else from this pattern, remember: running multiple schedulers and workers with a shared database and message broker keeps Airflow running smoothly even if one component fails.

Common Mistakes
Using the SequentialExecutor instead of CeleryExecutor for high availability
SequentialExecutor runs tasks one at a time on a single machine, so it cannot handle multiple workers or schedulers.
Set executor = CeleryExecutor in airflow.cfg to enable distributed task execution.
Not configuring a shared database and message broker for all Airflow components
Without a shared backend, schedulers and workers cannot coordinate, causing failures and inconsistent state.
Use a central PostgreSQL database and Redis broker configured in airflow.cfg for all nodes.
Starting only one scheduler or worker process
Single processes create a single point of failure, defeating high availability.
Run multiple scheduler and worker processes on different machines or containers.
Summary
Initialize the Airflow database to prepare metadata storage.
Create an admin user to access the Airflow web interface.
Start multiple schedulers and Celery workers to enable high availability.
Run the Airflow webserver to provide the user interface.
Verify the health of Airflow components to confirm high availability is active.