Challenge - 5 Problems

🎖️

AWS Operators Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ service_behavior

intermediate

2:00remaining

What happens when an Airflow S3CreateBucketOperator runs with an existing bucket name?

You use S3CreateBucketOperator in Airflow to create a bucket named my-data-bucket. The bucket already exists in your AWS account.

What will be the result of running this operator?

Apache Airflow

from airflow.providers.amazon.aws.operators.s3 import S3CreateBucketOperator

create_bucket = S3CreateBucketOperator(
    task_id='create_bucket',
    bucket_name='my-data-bucket',
    aws_conn_id='aws_default'
)

create_bucket.execute(context={})

AThe operator completes successfully without error because the bucket already exists.

BThe operator creates a new bucket with a suffix added to the name to avoid conflict.

CThe operator deletes the existing bucket and creates a new one with the same name.

DThe operator raises an error indicating the bucket name is already taken.

Attempts:

2 left

❓ Configuration

intermediate

2:00remaining

Which Airflow Redshift operator configuration correctly runs a SQL query on Redshift?

You want to run a SQL query on an Amazon Redshift cluster using Airflow's RedshiftSQLOperator. Which configuration will successfully execute the query?

Apache Airflow

from airflow.providers.amazon.aws.operators.redshift_sql import RedshiftSQLOperator

redshift_query = RedshiftSQLOperator(
    task_id='run_query',
    sql='SELECT COUNT(*) FROM users;',
    redshift_conn_id='redshift_default'
)

redshift_query.execute(context={})

Asql='SELECT COUNT(*) FROM users;', redshift_conn_id='redshift_default'

Bsql=['SELECT COUNT(*) FROM users;'], redshift_conn_id='redshift_default'

Csql='SELECT COUNT(*) FROM users;', aws_conn_id='aws_default'

Dsql='SELECT COUNT(*) FROM users;', redshift_conn_id='aws_default'

Attempts:

2 left

❓ Architecture

advanced

2:30remaining

Which Airflow EMR operator setup correctly launches an EMR cluster with a bootstrap action?

You want to launch an EMR cluster using Airflow's EmrCreateJobFlowOperator and run a bootstrap action script stored in S3. Which configuration is correct?

Apache Airflow

from airflow.providers.amazon.aws.operators.emr_create_job_flow import EmrCreateJobFlowOperator

JOB_FLOW_OVERRIDES = {
    'Name': 'TestCluster',
    'Instances': {
        'InstanceGroups': [
            {'Name': 'Master nodes', 'Market': 'ON_DEMAND', 'InstanceRole': 'MASTER', 'InstanceType': 'm5.xlarge', 'InstanceCount': 1},
            {'Name': 'Core nodes', 'Market': 'ON_DEMAND', 'InstanceRole': 'CORE', 'InstanceType': 'm5.xlarge', 'InstanceCount': 2}
        ],
        'KeepJobFlowAliveWhenNoSteps': True
    },
    'BootstrapActions': [
        {
            'Name': 'Install libraries',
            'ScriptBootstrapAction': {
                'Path': 's3://my-bucket/bootstrap.sh'
            }
        }
    ]
}

create_emr_cluster = EmrCreateJobFlowOperator(
    task_id='create_emr_cluster',
    job_flow_overrides=JOB_FLOW_OVERRIDES,
    aws_conn_id='aws_default'
)

create_emr_cluster.execute(context={})

AJOB_FLOW_OVERRIDES includes 'BootstrapActions' as a list of strings with S3 paths only.

BJOB_FLOW_OVERRIDES includes 'BootstrapActions' with 'ScriptBootstrapAction' containing 'Args' but no 'Path'.

CJOB_FLOW_OVERRIDES includes 'BootstrapActions' with 'ScriptBootstrapAction' containing 'Path' to S3 script.

DJOB_FLOW_OVERRIDES does not include 'BootstrapActions' but uses 'Steps' to run the bootstrap script.

Attempts:

2 left

❓ security

advanced

2:00remaining

Which Airflow S3 operator configuration follows best security practices for AWS credentials?

You want to use S3ListOperator in Airflow to list objects in a bucket. Which configuration best follows AWS security best practices?

Apache Airflow

from airflow.providers.amazon.aws.operators.s3 import S3ListOperator

list_s3 = S3ListOperator(
    task_id='list_s3',
    bucket='my-secure-bucket',
    aws_conn_id='aws_default'
)

list_s3.execute(context={})

AUse aws_conn_id with full admin permissions for convenience.

BUse aws_conn_id that points to an IAM role with least privilege for S3 access.

CHardcode AWS access key and secret key in the operator parameters.

DUse no aws_conn_id and rely on default environment variables with admin rights.

Attempts:

2 left

✅ Best Practice

expert

3:00remaining

What is the best approach to handle EMR cluster termination in Airflow to avoid extra costs?

You have an Airflow DAG that creates an EMR cluster, runs steps, and then should terminate the cluster. Which approach ensures the cluster is always terminated even if a step fails?

AUse <code>EmrTerminateJobFlowOperator</code> in a separate task with <code>trigger_rule='all_done'</code> to always run after steps.

BTerminate the cluster manually after the DAG finishes successfully.

CSet <code>KeepJobFlowAliveWhenNoSteps</code> to True and terminate cluster later.

DDo not terminate the cluster to reuse it for future jobs and save startup time.

Attempts:

2 left