0
0
Apache Airflowdevops~5 mins

BashOperator for shell commands in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
Sometimes you need to run simple shell commands or scripts as part of your workflow. BashOperator lets you do this easily inside Airflow, so you can automate tasks like file operations or running scripts without extra setup.
When you want to run a shell script to move or copy files as part of a data pipeline.
When you need to execute a command-line tool that does not have a dedicated Airflow operator.
When you want to run quick Linux commands like listing files or checking disk space during a workflow.
When you want to chain shell commands in a sequence inside your Airflow DAG.
When you want to automate running a backup script on a schedule.
Config File - bash_operator_dag.py
bash_operator_dag.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2024, 1, 1),
}

dag = DAG(
    'example_bash_operator',
    default_args=default_args,
    schedule_interval='@daily',
    catchup=False
)

run_ls = BashOperator(
    task_id='list_files',
    bash_command='ls -l /tmp',
    dag=dag
)

run_echo = BashOperator(
    task_id='print_message',
    bash_command='echo "Hello from Airflow BashOperator"',
    dag=dag
)

run_ls >> run_echo

This file defines an Airflow DAG named example_bash_operator that runs daily starting from January 1, 2024.

It has two tasks using BashOperator:

  • list_files: runs ls -l /tmp to list files in the /tmp directory.
  • print_message: prints a simple message to the logs.

The tasks run in order: first list files, then print the message.

Commands
This command lists all available DAGs in Airflow to confirm your DAG is recognized.
Terminal
airflow dags list
Expected OutputExpected
example_bash_operator
This command manually triggers the DAG to run the tasks immediately.
Terminal
airflow dags trigger example_bash_operator
Expected OutputExpected
Created <DagRun example_bash_operator @ 2024-06-01T12:00:00+00:00: manual__2024-06-01T12:00:00+00:00, externally triggered: True>
This command lists all tasks in the DAG to verify the tasks are defined correctly.
Terminal
airflow tasks list example_bash_operator
Expected OutputExpected
list_files print_message
This command runs the 'list_files' task for the given date without scheduling, useful for testing the command output.
Terminal
airflow tasks test example_bash_operator list_files 2024-06-01
Expected OutputExpected
[2024-06-01 12:00:00,000] {bash.py:123} INFO - Running command: ls -l /tmp [2024-06-01 12:00:00,100] {bash.py:130} INFO - total 0 -rw-r--r-- 1 user user 0 Jun 1 12:00 example.txt [2024-06-01 12:00:00,200] {bash.py:140} INFO - Command exited with return code 0
Key Concept

If you remember nothing else from this pattern, remember: BashOperator lets you run any shell command inside Airflow tasks simply and directly.

Common Mistakes
Using BashOperator with complex multi-line scripts without proper quoting or escaping.
The shell command may fail or behave unexpectedly because of incorrect syntax or line breaks.
Use triple quotes or write the script in a separate file and call it from BashOperator.
Not setting the correct start_date or schedule_interval in the DAG, causing the DAG not to run.
Airflow requires a valid start date and schedule to trigger DAG runs automatically.
Always set start_date in the past and a valid schedule_interval when defining the DAG.
Assuming BashOperator runs commands on your local machine instead of the Airflow worker environment.
Commands run where Airflow workers execute, which may differ from your local setup.
Ensure the commands and paths exist on the Airflow worker nodes.
Summary
Define a DAG with BashOperator tasks to run shell commands inside Airflow.
Use airflow CLI commands to list, trigger, and test DAG tasks.
BashOperator runs commands on Airflow workers, so commands must be valid there.