What is S3KeySensor in Airflow: Explanation and Example
S3KeySensor in Airflow is a sensor that waits for a specific file (key) to appear in an Amazon S3 bucket before allowing the workflow to continue. It helps automate workflows by pausing tasks until the required file is available in S3.How It Works
The S3KeySensor acts like a watchful guard that keeps checking an Amazon S3 bucket for a particular file. Imagine you are waiting for a package to arrive before you start cooking a meal. The sensor keeps looking at the delivery status and only lets you start cooking once the package is delivered.
In Airflow, this sensor repeatedly checks if the specified file (called a key in S3) exists in the bucket. It pauses the workflow at that point, so downstream tasks don’t start too early. Once the file is found, the sensor signals Airflow to continue with the next steps.
Example
This example shows how to use S3KeySensor to wait for a file named data.csv in the my-bucket S3 bucket before running a dummy task.
from airflow import DAG from airflow.providers.amazon.aws.sensors.s3_key import S3KeySensor from airflow.operators.dummy import DummyOperator from datetime import datetime with DAG(dag_id='s3_key_sensor_example', start_date=datetime(2024, 1, 1), schedule_interval='@daily', catchup=False) as dag: wait_for_file = S3KeySensor( task_id='wait_for_s3_file', bucket_name='my-bucket', bucket_key='data.csv', aws_conn_id='aws_default', poke_interval=30, timeout=600 ) proceed = DummyOperator(task_id='proceed_after_file') wait_for_file >> proceed
When to Use
Use S3KeySensor when your workflow depends on files arriving in an S3 bucket. For example, if you have a data pipeline that processes daily reports uploaded to S3, you want to wait until the report file is available before starting processing.
This sensor is helpful in automating workflows that rely on external systems to drop files, ensuring tasks run only when the needed data is ready.
Key Points
- S3KeySensor waits for a specific file in an S3 bucket.
- It pauses the Airflow task until the file appears.
- Useful for data pipelines that depend on external file arrivals.
- Configurable poke interval and timeout control how often it checks and how long it waits.