What is FileSensor in Airflow: Explanation and Example
FileSensor in Airflow is a special task that waits and checks for the presence of a specific file in a filesystem before allowing downstream tasks to run. It helps coordinate workflows by pausing execution until the required file appears.How It Works
FileSensor acts like a watchful guard waiting for a file to show up in a certain place, such as a folder on your computer or a network drive. It keeps checking repeatedly at set intervals until it finds the file or times out.
Think of it like waiting for a package delivery before you start assembling furniture. You don’t want to begin until the parts arrive. Similarly, FileSensor pauses the workflow until the file is ready, ensuring tasks run in the correct order.
Example
This example shows how to use FileSensor to wait for a file named data_ready.txt in the /tmp directory before running a simple task.
from airflow import DAG from airflow.sensors.filesystem import FileSensor from airflow.operators.bash import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'start_date': datetime(2024, 1, 1), 'retries': 1, 'retry_delay': timedelta(minutes=1), } dag = DAG( 'file_sensor_example', default_args=default_args, schedule_interval='@daily', catchup=False ) wait_for_file = FileSensor( task_id='wait_for_file', filepath='/tmp/data_ready.txt', fs_conn_id='fs_default', # default local filesystem connection poke_interval=10, # check every 10 seconds timeout=300, # timeout after 5 minutes dag=dag ) process_file = BashOperator( task_id='process_file', bash_command='echo "File is ready, processing started."', dag=dag ) wait_for_file >> process_file
When to Use
Use FileSensor when your workflow depends on a file being available before continuing. This is common in data pipelines where files are generated by external systems or processes.
For example, you might wait for a daily report file to arrive before starting data analysis or wait for a configuration file before launching a deployment.
This sensor helps avoid errors by ensuring tasks only run when their required inputs are ready.
Key Points
FileSensorwaits for a file to appear before proceeding.- It checks repeatedly at intervals defined by
poke_interval. - It can timeout if the file does not appear within a set time.
- Useful for coordinating workflows dependent on external file availability.
- Works with local or network file systems configured in Airflow connections.
Key Takeaways
FileSensor pauses a workflow until a specified file exists.