0
0
Apache Airflowdevops~20 mins

AWS operators (S3, Redshift, EMR) in Apache Airflow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
AWS Operators Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
service_behavior
intermediate
2:00remaining
What happens when an Airflow S3CreateBucketOperator runs with an existing bucket name?

You use S3CreateBucketOperator in Airflow to create a bucket named my-data-bucket. The bucket already exists in your AWS account.

What will be the result of running this operator?

Apache Airflow
from airflow.providers.amazon.aws.operators.s3 import S3CreateBucketOperator

create_bucket = S3CreateBucketOperator(
    task_id='create_bucket',
    bucket_name='my-data-bucket',
    aws_conn_id='aws_default'
)

create_bucket.execute(context={})
AThe operator completes successfully without error because the bucket already exists.
BThe operator creates a new bucket with a suffix added to the name to avoid conflict.
CThe operator deletes the existing bucket and creates a new one with the same name.
DThe operator raises an error indicating the bucket name is already taken.
Attempts:
2 left
💡 Hint

Think about AWS S3 bucket naming rules and uniqueness across all AWS accounts.

Configuration
intermediate
2:00remaining
Which Airflow Redshift operator configuration correctly runs a SQL query on Redshift?

You want to run a SQL query on an Amazon Redshift cluster using Airflow's RedshiftSQLOperator. Which configuration will successfully execute the query?

Apache Airflow
from airflow.providers.amazon.aws.operators.redshift_sql import RedshiftSQLOperator

redshift_query = RedshiftSQLOperator(
    task_id='run_query',
    sql='SELECT COUNT(*) FROM users;',
    redshift_conn_id='redshift_default'
)

redshift_query.execute(context={})
Asql='SELECT COUNT(*) FROM users;', redshift_conn_id='redshift_default'
Bsql=['SELECT COUNT(*) FROM users;'], redshift_conn_id='redshift_default'
Csql='SELECT COUNT(*) FROM users;', aws_conn_id='aws_default'
Dsql='SELECT COUNT(*) FROM users;', redshift_conn_id='aws_default'
Attempts:
2 left
💡 Hint

Check the parameter names and types expected by RedshiftSQLOperator.

Architecture
advanced
2:30remaining
Which Airflow EMR operator setup correctly launches an EMR cluster with a bootstrap action?

You want to launch an EMR cluster using Airflow's EmrCreateJobFlowOperator and run a bootstrap action script stored in S3. Which configuration is correct?

Apache Airflow
from airflow.providers.amazon.aws.operators.emr_create_job_flow import EmrCreateJobFlowOperator

JOB_FLOW_OVERRIDES = {
    'Name': 'TestCluster',
    'Instances': {
        'InstanceGroups': [
            {'Name': 'Master nodes', 'Market': 'ON_DEMAND', 'InstanceRole': 'MASTER', 'InstanceType': 'm5.xlarge', 'InstanceCount': 1},
            {'Name': 'Core nodes', 'Market': 'ON_DEMAND', 'InstanceRole': 'CORE', 'InstanceType': 'm5.xlarge', 'InstanceCount': 2}
        ],
        'KeepJobFlowAliveWhenNoSteps': True
    },
    'BootstrapActions': [
        {
            'Name': 'Install libraries',
            'ScriptBootstrapAction': {
                'Path': 's3://my-bucket/bootstrap.sh'
            }
        }
    ]
}

create_emr_cluster = EmrCreateJobFlowOperator(
    task_id='create_emr_cluster',
    job_flow_overrides=JOB_FLOW_OVERRIDES,
    aws_conn_id='aws_default'
)

create_emr_cluster.execute(context={})
AJOB_FLOW_OVERRIDES includes 'BootstrapActions' as a list of strings with S3 paths only.
BJOB_FLOW_OVERRIDES includes 'BootstrapActions' with 'ScriptBootstrapAction' containing 'Args' but no 'Path'.
CJOB_FLOW_OVERRIDES includes 'BootstrapActions' with 'ScriptBootstrapAction' containing 'Path' to S3 script.
DJOB_FLOW_OVERRIDES does not include 'BootstrapActions' but uses 'Steps' to run the bootstrap script.
Attempts:
2 left
💡 Hint

Bootstrap actions require a specific dictionary structure with 'Path' to the script.

security
advanced
2:00remaining
Which Airflow S3 operator configuration follows best security practices for AWS credentials?

You want to use S3ListOperator in Airflow to list objects in a bucket. Which configuration best follows AWS security best practices?

Apache Airflow
from airflow.providers.amazon.aws.operators.s3 import S3ListOperator

list_s3 = S3ListOperator(
    task_id='list_s3',
    bucket='my-secure-bucket',
    aws_conn_id='aws_default'
)

list_s3.execute(context={})
AUse aws_conn_id with full admin permissions for convenience.
BUse aws_conn_id that points to an IAM role with least privilege for S3 access.
CHardcode AWS access key and secret key in the operator parameters.
DUse no aws_conn_id and rely on default environment variables with admin rights.
Attempts:
2 left
💡 Hint

Think about the principle of least privilege and secure credential management.

Best Practice
expert
3:00remaining
What is the best approach to handle EMR cluster termination in Airflow to avoid extra costs?

You have an Airflow DAG that creates an EMR cluster, runs steps, and then should terminate the cluster. Which approach ensures the cluster is always terminated even if a step fails?

AUse <code>EmrTerminateJobFlowOperator</code> in a separate task with <code>trigger_rule='all_done'</code> to always run after steps.
BTerminate the cluster manually after the DAG finishes successfully.
CSet <code>KeepJobFlowAliveWhenNoSteps</code> to True and terminate cluster later.
DDo not terminate the cluster to reuse it for future jobs and save startup time.
Attempts:
2 left
💡 Hint

Consider Airflow task dependencies and trigger rules for cleanup tasks.