0
0
Apache Airflowdevops~10 mins

Multi-environment deployment (dev, staging, prod) in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you build workflows with Airflow, you often want to run the same workflows in different environments like development, staging, and production. This helps you test changes safely before using them for real work.
When you want to test new workflow changes without affecting real data processing.
When you need to have a safe place to try fixes before applying them to production.
When your team wants to review workflows in a staging environment before going live.
When you want to run different versions of the same workflow for different purposes.
When you want to separate logs and data for development and production to avoid confusion.
Config File - airflow.cfg
airflow.cfg
[core]
# The home folder for airflow, default is ~/airflow
airflow_home = /usr/local/airflow

[logging]
# Base log folder
base_log_folder = /usr/local/airflow/logs

[secrets]
# Use environment variables to set connections and variables
backend = env

[webserver]
# Webserver port
web_server_port = 8080

[environment]
environment = dev

This airflow.cfg file sets the base Airflow home directory and log folder. The environment key is a custom setting to specify which environment Airflow is running in (dev, staging, or prod). You can create separate airflow.cfg files for each environment with different values.

Using environment variables for secrets keeps sensitive data safe and environment-specific.

Commands
Set the Airflow home directory for the development environment to keep files separate from other environments.
Terminal
export AIRFLOW_HOME=/usr/local/airflow_dev
Expected OutputExpected
No output (command runs silently)
Initialize the Airflow database for the development environment. This sets up the metadata database Airflow needs to track workflows.
Terminal
airflow db init
Expected OutputExpected
INFO [alembic.runtime.migration] Context impl PostgresImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade head INFO [alembic.runtime.migration] Upgrade successful
Start the Airflow webserver on port 8081 for the development environment so you can access the UI separately from other environments.
Terminal
airflow webserver -p 8081
Expected OutputExpected
[2024-06-01 12:00:00,000] {webserver.py:123} INFO - Starting web server on port 8081 [2024-06-01 12:00:00,500] {webserver.py:456} INFO - Web server started
-p - Specify the port number for the webserver
Start the Airflow scheduler to run workflows in the development environment.
Terminal
airflow scheduler
Expected OutputExpected
[2024-06-01 12:00:01,000] {scheduler.py:789} INFO - Starting scheduler [2024-06-01 12:00:01,500] {scheduler.py:790} INFO - Scheduler started
List all workflows (DAGs) available in the current environment to verify they are loaded correctly.
Terminal
airflow dags list
Expected OutputExpected
dag_id example_dag my_data_pipeline
Key Concept

If you remember nothing else from this pattern, remember: keep each environment's Airflow files, databases, and ports separate to avoid conflicts and safely test changes.

Common Mistakes
Using the same AIRFLOW_HOME directory for dev, staging, and prod environments.
This causes workflows and logs to mix, leading to confusion and possible data loss or errors.
Set a unique AIRFLOW_HOME for each environment to isolate files and data.
Running multiple Airflow webservers on the same port.
Only one process can listen on a port, so others will fail to start.
Assign different ports for each environment's webserver using the -p flag.
Not initializing the Airflow database separately for each environment.
Workflows may not run correctly because the metadata database is shared or missing.
Run 'airflow db init' in each environment's AIRFLOW_HOME before starting Airflow.
Summary
Set a unique AIRFLOW_HOME directory for each environment to keep files separate.
Initialize the Airflow database in each environment with 'airflow db init'.
Start the webserver and scheduler with different ports and settings per environment.
Use 'airflow dags list' to verify workflows are loaded in the correct environment.