You use S3CreateBucketOperator in Airflow to create a bucket named my-data-bucket. The bucket already exists in your AWS account.
What will be the result of running this operator?
from airflow.providers.amazon.aws.operators.s3 import S3CreateBucketOperator create_bucket = S3CreateBucketOperator( task_id='create_bucket', bucket_name='my-data-bucket', aws_conn_id='aws_default' ) create_bucket.execute(context={})
Think about AWS S3 bucket naming rules and uniqueness across all AWS accounts.
S3 bucket names must be globally unique. Trying to create a bucket with a name that already exists (even in another account) causes an error. The operator does not silently succeed or rename the bucket.
You want to run a SQL query on an Amazon Redshift cluster using Airflow's RedshiftSQLOperator. Which configuration will successfully execute the query?
from airflow.providers.amazon.aws.operators.redshift_sql import RedshiftSQLOperator redshift_query = RedshiftSQLOperator( task_id='run_query', sql='SELECT COUNT(*) FROM users;', redshift_conn_id='redshift_default' ) redshift_query.execute(context={})
Check the parameter names and types expected by RedshiftSQLOperator.
The sql parameter must be a string or list of strings with SQL commands. The connection ID must be redshift_conn_id to connect to Redshift. Using aws_conn_id is incorrect here.
You want to launch an EMR cluster using Airflow's EmrCreateJobFlowOperator and run a bootstrap action script stored in S3. Which configuration is correct?
from airflow.providers.amazon.aws.operators.emr_create_job_flow import EmrCreateJobFlowOperator JOB_FLOW_OVERRIDES = { 'Name': 'TestCluster', 'Instances': { 'InstanceGroups': [ {'Name': 'Master nodes', 'Market': 'ON_DEMAND', 'InstanceRole': 'MASTER', 'InstanceType': 'm5.xlarge', 'InstanceCount': 1}, {'Name': 'Core nodes', 'Market': 'ON_DEMAND', 'InstanceRole': 'CORE', 'InstanceType': 'm5.xlarge', 'InstanceCount': 2} ], 'KeepJobFlowAliveWhenNoSteps': True }, 'BootstrapActions': [ { 'Name': 'Install libraries', 'ScriptBootstrapAction': { 'Path': 's3://my-bucket/bootstrap.sh' } } ] } create_emr_cluster = EmrCreateJobFlowOperator( task_id='create_emr_cluster', job_flow_overrides=JOB_FLOW_OVERRIDES, aws_conn_id='aws_default' ) create_emr_cluster.execute(context={})
Bootstrap actions require a specific dictionary structure with 'Path' to the script.
The correct way to specify bootstrap actions is a list of dictionaries with 'Name' and 'ScriptBootstrapAction' keys, where 'ScriptBootstrapAction' must include the 'Path' to the S3 script. Missing 'Path' or wrong format causes failure.
You want to use S3ListOperator in Airflow to list objects in a bucket. Which configuration best follows AWS security best practices?
from airflow.providers.amazon.aws.operators.s3 import S3ListOperator list_s3 = S3ListOperator( task_id='list_s3', bucket='my-secure-bucket', aws_conn_id='aws_default' ) list_s3.execute(context={})
Think about the principle of least privilege and secure credential management.
Best practice is to use an IAM role or user with only the permissions needed for the task. Hardcoding keys or using admin rights increases security risks.
You have an Airflow DAG that creates an EMR cluster, runs steps, and then should terminate the cluster. Which approach ensures the cluster is always terminated even if a step fails?
Consider Airflow task dependencies and trigger rules for cleanup tasks.
Using EmrTerminateJobFlowOperator with trigger_rule='all_done' ensures the cluster terminates regardless of previous task success or failure, preventing unexpected costs.