0
0
Apache Airflowdevops~30 mins

FileSensor for file arrival detection in Apache Airflow - Mini Project: Build & Apply

Choose your learning style9 modes available
FileSensor for file arrival detection
📖 Scenario: You are working with Apache Airflow to automate workflows. You want to create a simple DAG that waits for a specific file to arrive in a folder before proceeding.
🎯 Goal: Build an Airflow DAG that uses FileSensor to detect when a file named data_ready.txt appears in the /tmp/ directory.
📋 What You'll Learn
Create a DAG with the id file_sensor_dag
Use FileSensor to watch for /tmp/data_ready.txt
Set the sensor's poke_interval to 5 seconds
Set the sensor's timeout to 20 seconds
Add a dummy task process_file that runs after the sensor
💡 Why This Matters
🌍 Real World
FileSensors are used in real workflows to wait for data files to arrive before starting processing jobs, ensuring data is ready and avoiding errors.
💼 Career
Understanding sensors in Airflow is important for building reliable data pipelines and automating workflows in data engineering and DevOps roles.
Progress0 / 4 steps
1
Create the DAG and import necessary modules
Write code to import DAG from airflow, FileSensor from airflow.sensors.filesystem, and DummyOperator from airflow.operators.dummy. Then create a DAG object called file_sensor_dag with start_date set to 2024-01-01 and schedule_interval set to @daily.
Apache Airflow
Need a hint?

Use datetime from the datetime module for the start date.

2
Add the FileSensor task to detect the file
Create a FileSensor task called wait_for_file inside the file_sensor_dag that watches for the file /tmp/data_ready.txt. Set poke_interval=5 and timeout=20.
Apache Airflow
Need a hint?

Remember to pass dag=dag to the task.

3
Add a dummy task to run after the sensor
Create a DummyOperator task called process_file inside the file_sensor_dag. Then set the task dependency so that process_file runs after wait_for_file.
Apache Airflow
Need a hint?

Use the >> operator to set task order.

4
Print confirmation of DAG tasks
Add a print statement that outputs the string 'DAG file_sensor_dag with FileSensor and DummyOperator created successfully'.
Apache Airflow
Need a hint?

Use print() to show the confirmation message.