How to Use S3Hook in Airflow: Syntax, Example & Tips
Use
S3Hook in Airflow to connect and interact with AWS S3 buckets by creating an instance with your AWS connection ID. Then call its methods like load_file or read_key to upload or download files from S3 within your Airflow tasks.Syntax
The S3Hook class is initialized with an AWS connection ID defined in Airflow. You use its methods to perform actions on S3 buckets.
- conn_id: The Airflow connection ID for AWS credentials.
- load_file(filename, key, bucket_name): Uploads a local file to S3.
- read_key(key, bucket_name): Reads the content of a file from S3.
python
from airflow.providers.amazon.aws.hooks.s3 import S3Hook s3 = S3Hook(aws_conn_id='my_aws_conn') s3.load_file(filename='/path/to/file.txt', key='folder/file.txt', bucket_name='my-bucket') content = s3.read_key(key='folder/file.txt', bucket_name='my-bucket')
Example
This example shows how to use S3Hook inside an Airflow PythonOperator to upload a file to S3 and then read it back.
python
from airflow import DAG from airflow.operators.python import PythonOperator from airflow.providers.amazon.aws.hooks.s3 import S3Hook from datetime import datetime def upload_and_read(): s3 = S3Hook(aws_conn_id='my_aws_conn') local_file = '/tmp/example.txt' bucket = 'my-bucket' key = 'test/example.txt' # Write something to local file with open(local_file, 'w') as f: f.write('Hello from Airflow S3Hook!') # Upload file to S3 s3.load_file(filename=local_file, key=key, bucket_name=bucket) # Read file content from S3 content = s3.read_key(key=key, bucket_name=bucket) print('Content from S3:', content) with DAG(dag_id='s3hook_example', start_date=datetime(2024, 1, 1), schedule_interval=None, catchup=False) as dag: task = PythonOperator( task_id='upload_and_read_s3', python_callable=upload_and_read )
Output
Content from S3: Hello from Airflow S3Hook!
Common Pitfalls
- Not setting up the AWS connection in Airflow UI or environment, causing authentication failures.
- Using incorrect
bucket_nameorkeypaths leading to errors or missing files. - Forgetting to handle file paths correctly on local and S3 sides.
- Not having proper IAM permissions for the AWS credentials used.
Always verify your AWS connection and permissions before using S3Hook.
python
from airflow.providers.amazon.aws.hooks.s3 import S3Hook # Wrong: Missing AWS connection ID or wrong bucket name s3 = S3Hook() # No aws_conn_id specified s3.load_file(filename='/tmp/file.txt', key='file.txt', bucket_name='wrong-bucket') # Right: s3 = S3Hook(aws_conn_id='my_aws_conn') s3.load_file(filename='/tmp/file.txt', key='file.txt', bucket_name='correct-bucket')
Quick Reference
Remember these key points when using S3Hook:
- Always specify
aws_conn_idmatching your Airflow AWS connection. - Use
load_fileto upload local files to S3. - Use
read_keyto read file contents from S3. - Check your AWS IAM permissions for S3 access.
- Ensure bucket names and keys are correct and exist.
Key Takeaways
Initialize S3Hook with your Airflow AWS connection ID to access S3.
Use load_file() to upload and read_key() to download files from S3.
Verify AWS credentials and permissions are correctly configured in Airflow.
Always double-check bucket names and file keys to avoid errors.
Use S3Hook inside Airflow tasks like PythonOperator for automation.