0
0
Apache Airflowdevops~5 mins

FileSensor for file arrival detection in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
Sometimes you want your workflow to wait until a specific file appears before continuing. FileSensor in Airflow helps by pausing the workflow until the file arrives, so your tasks run only when the file is ready.
When you need to start processing data only after a file is uploaded to a folder.
When your workflow depends on an external system dropping a file before continuing.
When you want to avoid errors by ensuring input files exist before running tasks.
When you want to automate waiting for daily reports to arrive before analysis.
When you want to trigger downstream tasks only after a file transfer completes.
Config File - file_sensor_dag.py
file_sensor_dag.py
from airflow import DAG
from airflow.sensors.filesystem import FileSensor
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

with DAG(
    dag_id='file_sensor_example',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    catchup=False,
    default_args={
        'retries': 1,
        'retry_delay': timedelta(minutes=5),
    },
) as dag:

    wait_for_file = FileSensor(
        task_id='wait_for_file',
        filepath='/tmp/data/input_file.csv',
        poke_interval=10,
        timeout=600,
        mode='poke'
    )

    process_file = BashOperator(
        task_id='process_file',
        bash_command='echo "Processing file..."'
    )

    wait_for_file >> process_file

This DAG waits for the file /tmp/data/input_file.csv to appear before running the processing task.

FileSensor: Watches the file path every 10 seconds (poke_interval) and times out after 10 minutes (timeout).

BashOperator: Runs a simple command after the file is detected.

Task order: The processing task runs only after the file sensor confirms the file exists.

Commands
Lists all available DAGs to confirm your DAG file is recognized by Airflow.
Terminal
airflow dags list
Expected OutputExpected
file_sensor_example example_bash_operator example_branch_dop_operator example_xcom example_subdag_operator example_trigger_target_dag
Manually triggers the DAG to start the workflow and begin waiting for the file.
Terminal
airflow dags trigger file_sensor_example
Expected OutputExpected
Created <DagRun file_sensor_example @ 2024-06-01T12:00:00+00:00: manual__2024-06-01T12:00:00+00:00, externally triggered: True>
Shows the tasks in the DAG to verify the sensor and processing tasks are present.
Terminal
airflow tasks list file_sensor_example
Expected OutputExpected
wait_for_file process_file
Tests the FileSensor task for the given date to see if it waits or completes based on file presence.
Terminal
airflow tasks test file_sensor_example wait_for_file 2024-06-01
Expected OutputExpected
INFO - Using executor SequentialExecutor INFO - Running task instance file_sensor_example.wait_for_file on 2024-06-01 INFO - Poking for file: /tmp/data/input_file.csv INFO - Task exited with return code 0
Key Concept

If you remember nothing else from this pattern, remember: FileSensor pauses your workflow until the specified file appears, preventing errors from missing inputs.

Common Mistakes
Setting the wrong file path in FileSensor.
The sensor will wait forever or timeout because it never finds the file.
Double-check the exact file path and filename before running the DAG.
Using a very short timeout or poke_interval.
The sensor may timeout too soon or check too frequently, causing unnecessary load.
Set reasonable poke_interval (like 10 seconds) and timeout (like 10 minutes) based on expected file arrival time.
Not setting task dependencies after the sensor.
Downstream tasks may run before the file is ready, causing failures.
Always set downstream tasks to run after the FileSensor task using >> or set_downstream.
Summary
Create a DAG with a FileSensor to wait for a specific file path.
Trigger the DAG and verify tasks to ensure the sensor waits correctly.
Set downstream tasks to run only after the file is detected to avoid errors.