0
0
Apache Airflowdevops~10 mins

AWS operators (S3, Redshift, EMR) in Apache Airflow - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create an S3ToRedshiftOperator that copies data from S3 to Redshift.

Apache Airflow
copy_to_redshift = S3ToRedshiftOperator(
    task_id='copy_s3_to_redshift',
    redshift_conn_id='redshift_default',
    aws_conn_id='aws_default',
    schema='public',
    table='users',
    s3_bucket='my-bucket',
    s3_key=[1],
    copy_options=['csv']
)
Drag options to blanks, or click blank then click option'
A'data/users.csv'
B's3://my-bucket/data/users.csv'
C'users.csv'
D'/data/users.csv'
Attempts:
3 left
💡 Hint
Common Mistakes
Using full s3 URI instead of just the key path.
Using an absolute file path starting with '/'.
2fill in blank
medium

Complete the code to define an EMRCreateJobFlowOperator with the correct job flow name.

Apache Airflow
create_emr_cluster = EMRCreateJobFlowOperator(
    task_id='create_emr_cluster',
    job_flow_overrides={
        'Name': [1],
        'ReleaseLabel': 'emr-6.3.0',
        'Instances': {
            'InstanceGroups': [
                {
                    'Name': 'Master nodes',
                    'Market': 'ON_DEMAND',
                    'InstanceRole': 'MASTER',
                    'InstanceType': 'm5.xlarge',
                    'InstanceCount': 1
                }
            ],
            'KeepJobFlowAliveWhenNoSteps': True
        }
    }
)
Drag options to blanks, or click blank then click option'
A'EMR Cluster'
BMyEMRCluster
C'emr_cluster'
D'MyEMRCluster'
Attempts:
3 left
💡 Hint
Common Mistakes
Forgetting quotes around the cluster name.
Using a variable name without defining it.
3fill in blank
hard

Fix the error in the RedshiftToS3Operator by completing the parameter for unloading data.

Apache Airflow
unload_to_s3 = RedshiftToS3Operator(
    task_id='unload_redshift_to_s3',
    redshift_conn_id='redshift_default',
    aws_conn_id='aws_default',
    schema='public',
    table='sales',
    s3_bucket='my-bucket',
    s3_key='exports/sales/',
    unload_options=[[1]]
)
Drag options to blanks, or click blank then click option'
A'parallel=off'
B'parallel off'
C'PARALLEL OFF'
D'PARALLEL=OFF'
Attempts:
3 left
💡 Hint
Common Mistakes
Using lowercase or equals sign in unload options.
Missing quotes around the option string.
4fill in blank
hard

Fill both blanks to configure an EMRAddStepsOperator that adds a Spark step to the cluster.

Apache Airflow
add_spark_step = EMRAddStepsOperator(
    task_id='add_spark_step',
    job_flow_id=[1],
    aws_conn_id='aws_default',
    steps=[{
        'Name': 'Spark application',
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': ['spark-submit', [2]]
        }
    }]
)
Drag options to blanks, or click blank then click option'
A'j-2AXXXXXXGAPLF'
B'--deploy-mode',
C's3://my-bucket/scripts/spark_app.py'
D'--master'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong format for job_flow_id.
Passing incorrect spark-submit arguments.
5fill in blank
hard

Fill all three blanks to define an S3CreateBucketOperator with region and ACL settings.

Apache Airflow
create_bucket = S3CreateBucketOperator(
    task_id='create_bucket',
    bucket_name=[1],
    region_name=[2],
    acl=[3],
    aws_conn_id='aws_default'
)
Drag options to blanks, or click blank then click option'
A'my-new-bucket-12345'
B'us-west-2'
C'private'
D'public-read'
Attempts:
3 left
💡 Hint
Common Mistakes
Using invalid region names.
Using ACL values not supported by S3.