What is Sensor in Airflow: Definition and Usage
Sensor is a special type of operator that waits for a certain condition or event to happen before allowing the workflow to continue. It acts like a watchful guard that pauses the task until the required condition, such as a file arrival or a database update, is met.How It Works
A sensor in Airflow works like a waiting checkpoint in your workflow. Imagine you are waiting for a package delivery before you can start assembling it. The sensor keeps checking if the package has arrived without moving forward until it does.
Technically, a sensor is a type of operator that repeatedly checks for a condition at set intervals. It uses a method called poke to test if the condition is true. If not, it waits and tries again later. This prevents the workflow from running tasks that depend on something not ready yet.
This mechanism helps coordinate workflows that depend on external events, like waiting for a file to appear in cloud storage or a database record to be updated.
Example
This example shows a simple Airflow DAG with a FileSensor that waits for a file named data_ready.txt in a local directory before running the next task.
from airflow import DAG from airflow.sensors.filesystem import FileSensor from airflow.operators.bash import BashOperator from datetime import datetime with DAG( dag_id='file_sensor_example', start_date=datetime(2024, 1, 1), schedule_interval='@daily', catchup=False ) as dag: wait_for_file = FileSensor( task_id='wait_for_file', filepath='data_ready.txt', fs_conn_id='fs_default', poke_interval=30, timeout=600 ) process_file = BashOperator( task_id='process_file', bash_command='echo "File is ready, processing now..."' ) wait_for_file >> process_file
When to Use
Use sensors when your workflow depends on an external event or condition that may happen at an unpredictable time. Sensors help avoid errors by pausing the workflow until the required condition is met.
Common real-world uses include:
- Waiting for a data file to arrive in cloud storage before processing.
- Waiting for a database record or table to be updated.
- Waiting for an API to return a specific response.
- Waiting for a cloud resource to be ready.
Sensors are useful in data pipelines where tasks must run only after certain inputs are available.
Key Points
- Sensors are special Airflow operators that wait for conditions.
- They repeatedly check (poke) until the condition is true or timeout occurs.
- Common sensors include
FileSensor,HttpSensor, andSqlSensor. - Use sensors to coordinate workflows with external events.
- Proper timeout and poke intervals prevent resource waste.