0
0
Apache Airflowdevops~10 mins

Installing Airflow locally - Step-by-Step CLI Walkthrough

Choose your learning style9 modes available
Introduction
Airflow helps you schedule and run tasks automatically. Installing it locally lets you try it out and build workflows on your own computer without needing a server.
When you want to learn how Airflow works before using it on a real server
When you need to test your task workflows quickly on your laptop
When you want to develop and debug your data pipelines without internet
When you want to experiment with scheduling tasks without affecting others
When you want to run simple automation jobs on your own machine
Commands
Create a clean virtual environment to keep Airflow and its packages separate from other Python projects.
Terminal
python3 -m venv airflow_env
Expected OutputExpected
No output (command runs silently)
Activate the virtual environment so that Python commands use the isolated packages inside it.
Terminal
source airflow_env/bin/activate
Expected OutputExpected
No output (command runs silently)
Install Airflow version 2.7.1 with the exact package versions it needs for Python 3.8 to avoid compatibility issues.
Terminal
pip install apache-airflow==2.7.1 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.7.1/constraints-3.8.txt
Expected OutputExpected
Collecting apache-airflow==2.7.1 Downloading apache_airflow-2.7.1-py3-none-any.whl (1.2 MB) ... Successfully installed apache-airflow-2.7.1 ...
==2.7.1 - Specifies the exact Airflow version to install
--constraint - Ensures compatible package versions for Airflow
Initialize the Airflow database which stores task and workflow information.
Terminal
airflow db init
Expected OutputExpected
INFO [alembic.runtime.migration] Context impl PostgresImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade head INFO [alembic.runtime.migration] Upgrade successful
Create an admin user to log into the Airflow web interface.
Terminal
airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password admin
Expected OutputExpected
User 'admin' created successfully
--role Admin - Gives full access to Airflow features
Start the Airflow web interface on port 8080 so you can open it in your browser.
Terminal
airflow webserver --port 8080
Expected OutputExpected
[2024-06-01 12:00:00,000] {webserver.py:123} INFO - Starting web server on port 8080 [2024-06-01 12:00:00,100] {webserver.py:456} INFO - Web server started
--port 8080 - Sets the webserver to listen on port 8080
Start the scheduler that runs your tasks at the right times.
Terminal
airflow scheduler
Expected OutputExpected
[2024-06-01 12:00:05,000] {scheduler.py:789} INFO - Starting scheduler [2024-06-01 12:00:05,100] {scheduler.py:790} INFO - Scheduler started
Key Concept

If you remember nothing else, remember: Airflow needs a database initialized and a user created before you can use its web interface and scheduler.

Common Mistakes
Skipping the virtual environment and installing Airflow globally
This can cause package conflicts with other Python projects or system tools.
Always create and activate a virtual environment before installing Airflow.
Not using the constraints file when installing Airflow
Airflow depends on specific package versions; ignoring constraints can cause errors or crashes.
Use the --constraint flag with the official constraints file matching your Airflow version and Python version.
Trying to access the web interface before creating a user
Airflow requires a user to log in; without one, you cannot access the UI.
Run the airflow users create command to add an admin user before starting the webserver.
Summary
Create and activate a Python virtual environment to isolate Airflow installation.
Install Airflow with the correct version and constraints to avoid package conflicts.
Initialize the Airflow database and create an admin user for access.
Start the Airflow webserver and scheduler to run and monitor workflows locally.