0
0
AirflowConceptBeginner · 3 min read

What is FileSensor in Airflow: Explanation and Example

FileSensor in Airflow is a special task that waits and checks for the presence of a specific file in a filesystem before allowing downstream tasks to run. It helps coordinate workflows by pausing execution until the required file appears.
⚙️

How It Works

FileSensor acts like a watchful guard waiting for a file to show up in a certain place, such as a folder on your computer or a network drive. It keeps checking repeatedly at set intervals until it finds the file or times out.

Think of it like waiting for a package delivery before you start assembling furniture. You don’t want to begin until the parts arrive. Similarly, FileSensor pauses the workflow until the file is ready, ensuring tasks run in the correct order.

💻

Example

This example shows how to use FileSensor to wait for a file named data_ready.txt in the /tmp directory before running a simple task.

python
from airflow import DAG
from airflow.sensors.filesystem import FileSensor
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

dag = DAG(
    'file_sensor_example',
    default_args=default_args,
    schedule_interval='@daily',
    catchup=False
)

wait_for_file = FileSensor(
    task_id='wait_for_file',
    filepath='/tmp/data_ready.txt',
    fs_conn_id='fs_default',  # default local filesystem connection
    poke_interval=10,  # check every 10 seconds
    timeout=300,  # timeout after 5 minutes
    dag=dag
)

process_file = BashOperator(
    task_id='process_file',
    bash_command='echo "File is ready, processing started."',
    dag=dag
)

wait_for_file >> process_file
Output
FileSensor waits until /tmp/data_ready.txt exists, then BashOperator prints: File is ready, processing started.
🎯

When to Use

Use FileSensor when your workflow depends on a file being available before continuing. This is common in data pipelines where files are generated by external systems or processes.

For example, you might wait for a daily report file to arrive before starting data analysis or wait for a configuration file before launching a deployment.

This sensor helps avoid errors by ensuring tasks only run when their required inputs are ready.

Key Points

  • FileSensor waits for a file to appear before proceeding.
  • It checks repeatedly at intervals defined by poke_interval.
  • It can timeout if the file does not appear within a set time.
  • Useful for coordinating workflows dependent on external file availability.
  • Works with local or network file systems configured in Airflow connections.

Key Takeaways

FileSensor pauses a workflow until a specified file exists.
It repeatedly checks for the file at intervals and can timeout if not found.
Ideal for workflows that depend on external files to proceed safely.
Configured with file path, poke interval, and timeout settings.
Helps prevent errors by ensuring required files are ready before tasks run.