0
0
AirflowHow-ToBeginner · 3 min read

How to Use S3Hook in Airflow: Syntax, Example & Tips

Use S3Hook in Airflow to connect and interact with AWS S3 buckets by creating an instance with your AWS connection ID. Then call its methods like load_file or read_key to upload or download files from S3 within your Airflow tasks.
📐

Syntax

The S3Hook class is initialized with an AWS connection ID defined in Airflow. You use its methods to perform actions on S3 buckets.

  • conn_id: The Airflow connection ID for AWS credentials.
  • load_file(filename, key, bucket_name): Uploads a local file to S3.
  • read_key(key, bucket_name): Reads the content of a file from S3.
python
from airflow.providers.amazon.aws.hooks.s3 import S3Hook

s3 = S3Hook(aws_conn_id='my_aws_conn')
s3.load_file(filename='/path/to/file.txt', key='folder/file.txt', bucket_name='my-bucket')
content = s3.read_key(key='folder/file.txt', bucket_name='my-bucket')
💻

Example

This example shows how to use S3Hook inside an Airflow PythonOperator to upload a file to S3 and then read it back.

python
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from datetime import datetime

def upload_and_read():
    s3 = S3Hook(aws_conn_id='my_aws_conn')
    local_file = '/tmp/example.txt'
    bucket = 'my-bucket'
    key = 'test/example.txt'

    # Write something to local file
    with open(local_file, 'w') as f:
        f.write('Hello from Airflow S3Hook!')

    # Upload file to S3
    s3.load_file(filename=local_file, key=key, bucket_name=bucket)

    # Read file content from S3
    content = s3.read_key(key=key, bucket_name=bucket)
    print('Content from S3:', content)

with DAG(dag_id='s3hook_example', start_date=datetime(2024, 1, 1), schedule_interval=None, catchup=False) as dag:
    task = PythonOperator(
        task_id='upload_and_read_s3',
        python_callable=upload_and_read
    )
Output
Content from S3: Hello from Airflow S3Hook!
⚠️

Common Pitfalls

  • Not setting up the AWS connection in Airflow UI or environment, causing authentication failures.
  • Using incorrect bucket_name or key paths leading to errors or missing files.
  • Forgetting to handle file paths correctly on local and S3 sides.
  • Not having proper IAM permissions for the AWS credentials used.

Always verify your AWS connection and permissions before using S3Hook.

python
from airflow.providers.amazon.aws.hooks.s3 import S3Hook

# Wrong: Missing AWS connection ID or wrong bucket name
s3 = S3Hook()  # No aws_conn_id specified
s3.load_file(filename='/tmp/file.txt', key='file.txt', bucket_name='wrong-bucket')

# Right:
s3 = S3Hook(aws_conn_id='my_aws_conn')
s3.load_file(filename='/tmp/file.txt', key='file.txt', bucket_name='correct-bucket')
📊

Quick Reference

Remember these key points when using S3Hook:

  • Always specify aws_conn_id matching your Airflow AWS connection.
  • Use load_file to upload local files to S3.
  • Use read_key to read file contents from S3.
  • Check your AWS IAM permissions for S3 access.
  • Ensure bucket names and keys are correct and exist.

Key Takeaways

Initialize S3Hook with your Airflow AWS connection ID to access S3.
Use load_file() to upload and read_key() to download files from S3.
Verify AWS credentials and permissions are correctly configured in Airflow.
Always double-check bucket names and file keys to avoid errors.
Use S3Hook inside Airflow tasks like PythonOperator for automation.