0
0
Apache Airflowdevops~5 mins

What is Apache Airflow - CLI Guide

Choose your learning style9 modes available
Introduction
Apache Airflow helps you organize and run tasks automatically in a specific order. It solves the problem of managing many steps that depend on each other, like a recipe for a cake but for computer jobs.
When you want to run data processing steps one after another without doing it manually.
When you need to schedule tasks to run at certain times, like every day or every hour.
When you want to see if your tasks worked or failed and get alerts.
When you have many tasks that depend on each other and want to manage their order easily.
When you want to reuse and share task workflows with your team.
Commands
This command sets up the database Airflow uses to keep track of tasks and workflows.
Terminal
airflow db init
Expected OutputExpected
INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade head INFO [alembic.runtime.migration] Upgrade successful
Creates an admin user so you can log into the Airflow web interface and manage workflows.
Terminal
airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password admin123
Expected OutputExpected
User created successfully
--username - Set the login name
--role - Set user permissions
--password - Set the login password
Starts the Airflow web interface on port 8080 so you can see and control your workflows in a browser.
Terminal
airflow webserver --port 8080
Expected OutputExpected
[2024-06-01 12:00:00,000] {webserver.py:123} INFO - Starting web server on port 8080
--port - Choose the port number for the web interface
Starts the scheduler that runs your tasks at the right time and in the right order.
Terminal
airflow scheduler
Expected OutputExpected
[2024-06-01 12:00:05,000] {scheduler.py:456} INFO - Starting the scheduler
Key Concept

If you remember nothing else, remember: Apache Airflow lets you automate and monitor complex task workflows easily.

Common Mistakes
Not initializing the database before starting Airflow
Airflow needs the database to store task info; without it, Airflow won't work.
Always run 'airflow db init' before starting the webserver or scheduler.
Forgetting to create a user before accessing the web interface
Without a user, you cannot log in to manage workflows.
Create at least one user with 'airflow users create' before starting the webserver.
Running the webserver and scheduler in the same terminal without backgrounding
One process will block the other, so both won't run properly.
Run them in separate terminals or use background processes.
Summary
Initialize Airflow's database with 'airflow db init' to prepare the system.
Create a user to access the Airflow web interface using 'airflow users create'.
Start the webserver to view and control workflows in a browser.
Run the scheduler to execute tasks automatically in the right order.