0
0
AirflowConceptBeginner · 4 min read

What is S3KeySensor in Airflow: Explanation and Example

The S3KeySensor in Airflow is a sensor that waits for a specific file (key) to appear in an Amazon S3 bucket before allowing the workflow to continue. It helps automate workflows by pausing tasks until the required file is available in S3.
⚙️

How It Works

The S3KeySensor acts like a watchful guard that keeps checking an Amazon S3 bucket for a particular file. Imagine you are waiting for a package to arrive before you start cooking a meal. The sensor keeps looking at the delivery status and only lets you start cooking once the package is delivered.

In Airflow, this sensor repeatedly checks if the specified file (called a key in S3) exists in the bucket. It pauses the workflow at that point, so downstream tasks don’t start too early. Once the file is found, the sensor signals Airflow to continue with the next steps.

💻

Example

This example shows how to use S3KeySensor to wait for a file named data.csv in the my-bucket S3 bucket before running a dummy task.

python
from airflow import DAG
from airflow.providers.amazon.aws.sensors.s3_key import S3KeySensor
from airflow.operators.dummy import DummyOperator
from datetime import datetime

with DAG(dag_id='s3_key_sensor_example', start_date=datetime(2024, 1, 1), schedule_interval='@daily', catchup=False) as dag:
    wait_for_file = S3KeySensor(
        task_id='wait_for_s3_file',
        bucket_name='my-bucket',
        bucket_key='data.csv',
        aws_conn_id='aws_default',
        poke_interval=30,
        timeout=600
    )

    proceed = DummyOperator(task_id='proceed_after_file')

    wait_for_file >> proceed
Output
The DAG will pause at 'wait_for_s3_file' until 'data.csv' appears in 'my-bucket'. Once detected, it proceeds to 'proceed_after_file'.
🎯

When to Use

Use S3KeySensor when your workflow depends on files arriving in an S3 bucket. For example, if you have a data pipeline that processes daily reports uploaded to S3, you want to wait until the report file is available before starting processing.

This sensor is helpful in automating workflows that rely on external systems to drop files, ensuring tasks run only when the needed data is ready.

Key Points

  • S3KeySensor waits for a specific file in an S3 bucket.
  • It pauses the Airflow task until the file appears.
  • Useful for data pipelines that depend on external file arrivals.
  • Configurable poke interval and timeout control how often it checks and how long it waits.

Key Takeaways

S3KeySensor waits for a file to appear in an S3 bucket before continuing a workflow.
It helps automate workflows that depend on external files arriving in S3.
You can set how often it checks and how long it waits before timing out.
It prevents downstream tasks from running too early without required data.
Ideal for data pipelines triggered by file availability in S3.