0
0
Apache Airflowdevops~10 mins

FileSensor for file arrival detection in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - FileSensor for file arrival detection
Start FileSensor Task
Check if file exists
Proceed
Finish
The FileSensor task starts and checks if the target file exists. If yes, it proceeds. If no, it waits and retries until the file arrives or timeout.
Execution Sample
Apache Airflow
from airflow.sensors.filesystem import FileSensor

file_sensor = FileSensor(
    task_id='wait_for_file',
    filepath='/data/input/data.csv',
    poke_interval=5,
    timeout=20
)
This code creates a FileSensor that checks every 5 seconds for the file '/data/input/data.csv' and times out after 20 seconds if the file does not appear.
Process Table
StepTime (s)File Exists?ActionResult
10NoCheck file presenceFile not found, wait 5s
25NoCheck file presenceFile not found, wait 5s
310YesCheck file presenceFile found, proceed
410-Task completesFileSensor succeeds
💡 File found at 10 seconds, FileSensor task completes successfully
Status Tracker
VariableStartAfter 1After 2After 3Final
file_existsFalseFalseFalseTrueTrue
time_elapsed05101010
task_staterunningrunningrunningsuccesssuccess
Key Moments - 3 Insights
Why does the FileSensor wait multiple times before succeeding?
Because the file does not exist initially (see execution_table steps 1 and 2), the sensor waits and retries every poke_interval until the file appears or timeout occurs.
What happens if the file never arrives within the timeout?
The FileSensor task will fail after the timeout period (not shown in this trace), stopping retries and marking the task as failed.
Why is the time_elapsed the same at steps 3 and 4?
At step 3, the file is detected, so the task immediately proceeds to completion at step 4 without additional waiting.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, at which step does the file first exist?
AStep 1
BStep 3
CStep 2
DStep 4
💡 Hint
Check the 'File Exists?' column in the execution_table rows
According to the variable tracker, what is the task_state after step 2?
Arunning
Bfailed
Csuccess
Dwaiting
💡 Hint
Look at the 'task_state' row and the 'After 2' column in variable_tracker
If poke_interval was set to 10 seconds instead of 5, how would the time_elapsed values change in the execution table?
AThey would stay the same
BThey would increase by 5 seconds each step
CThey would increase by 10 seconds each step
DThey would decrease
💡 Hint
poke_interval controls wait time between checks, see 'Time (s)' column in execution_table
Concept Snapshot
FileSensor waits for a file to appear before continuing.
It checks the file path every poke_interval seconds.
If the file exists, the task succeeds immediately.
If the file doesn't appear before timeout, the task fails.
Use FileSensor in Airflow to pause workflows until files arrive.
Full Transcript
The FileSensor task in Airflow starts by checking if a specified file exists. If the file is not found, it waits for a set poke_interval before checking again. This repeats until the file appears or the timeout is reached. In the example, the sensor checks every 5 seconds and finds the file at 10 seconds, then completes successfully. Variables like file_exists and task_state update accordingly. If the file never arrives, the task fails after timeout. This sensor helps workflows wait for necessary files before proceeding.