0
0
AirflowConceptBeginner · 3 min read

What is Sensor in Airflow: Definition and Usage

In Apache Airflow, a Sensor is a special type of operator that waits for a certain condition or event to happen before allowing the workflow to continue. It acts like a watchful guard that pauses the task until the required condition, such as a file arrival or a database update, is met.
⚙️

How It Works

A sensor in Airflow works like a waiting checkpoint in your workflow. Imagine you are waiting for a package delivery before you can start assembling it. The sensor keeps checking if the package has arrived without moving forward until it does.

Technically, a sensor is a type of operator that repeatedly checks for a condition at set intervals. It uses a method called poke to test if the condition is true. If not, it waits and tries again later. This prevents the workflow from running tasks that depend on something not ready yet.

This mechanism helps coordinate workflows that depend on external events, like waiting for a file to appear in cloud storage or a database record to be updated.

💻

Example

This example shows a simple Airflow DAG with a FileSensor that waits for a file named data_ready.txt in a local directory before running the next task.

python
from airflow import DAG
from airflow.sensors.filesystem import FileSensor
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    dag_id='file_sensor_example',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    catchup=False
) as dag:

    wait_for_file = FileSensor(
        task_id='wait_for_file',
        filepath='data_ready.txt',
        fs_conn_id='fs_default',
        poke_interval=30,
        timeout=600
    )

    process_file = BashOperator(
        task_id='process_file',
        bash_command='echo "File is ready, processing now..."'
    )

    wait_for_file >> process_file
Output
Task 'wait_for_file' will keep checking every 30 seconds for 'data_ready.txt' file up to 10 minutes. Once found, it triggers 'process_file' task which prints: File is ready, processing now...
🎯

When to Use

Use sensors when your workflow depends on an external event or condition that may happen at an unpredictable time. Sensors help avoid errors by pausing the workflow until the required condition is met.

Common real-world uses include:

  • Waiting for a data file to arrive in cloud storage before processing.
  • Waiting for a database record or table to be updated.
  • Waiting for an API to return a specific response.
  • Waiting for a cloud resource to be ready.

Sensors are useful in data pipelines where tasks must run only after certain inputs are available.

Key Points

  • Sensors are special Airflow operators that wait for conditions.
  • They repeatedly check (poke) until the condition is true or timeout occurs.
  • Common sensors include FileSensor, HttpSensor, and SqlSensor.
  • Use sensors to coordinate workflows with external events.
  • Proper timeout and poke intervals prevent resource waste.

Key Takeaways

A sensor in Airflow waits for an external condition before continuing a workflow.
Sensors repeatedly check for conditions using a poke method at set intervals.
Use sensors to handle dependencies on files, database states, APIs, or cloud resources.
Set sensible timeouts and poke intervals to avoid wasting resources.
Common sensors include FileSensor, HttpSensor, and SqlSensor.