Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to create an S3ToRedshiftOperator that copies data from S3 to Redshift.

Apache Airflow

copy_to_redshift = S3ToRedshiftOperator(
    task_id='copy_s3_to_redshift',
    redshift_conn_id='redshift_default',
    aws_conn_id='aws_default',
    schema='public',
    table='users',
    s3_bucket='my-bucket',
    s3_key=[1],
    copy_options=['csv']
)

Drag options to blanks, or click blank then click option'

A'data/users.csv'

B's3://my-bucket/data/users.csv'

C'users.csv'

D'/data/users.csv'

Attempts:

3 left

2fill in blank

medium

Complete the code to define an EMRCreateJobFlowOperator with the correct job flow name.

Apache Airflow

create_emr_cluster = EMRCreateJobFlowOperator(
    task_id='create_emr_cluster',
    job_flow_overrides={
        'Name': [1],
        'ReleaseLabel': 'emr-6.3.0',
        'Instances': {
            'InstanceGroups': [
                {
                    'Name': 'Master nodes',
                    'Market': 'ON_DEMAND',
                    'InstanceRole': 'MASTER',
                    'InstanceType': 'm5.xlarge',
                    'InstanceCount': 1
                }
            ],
            'KeepJobFlowAliveWhenNoSteps': True
        }
    }
)

Drag options to blanks, or click blank then click option'

A'EMR Cluster'

BMyEMRCluster

C'emr_cluster'

D'MyEMRCluster'

Attempts:

3 left

3fill in blank

hard

Fix the error in the RedshiftToS3Operator by completing the parameter for unloading data.

Apache Airflow

unload_to_s3 = RedshiftToS3Operator(
    task_id='unload_redshift_to_s3',
    redshift_conn_id='redshift_default',
    aws_conn_id='aws_default',
    schema='public',
    table='sales',
    s3_bucket='my-bucket',
    s3_key='exports/sales/',
    unload_options=[[1]]
)

Drag options to blanks, or click blank then click option'

A'parallel=off'

B'parallel off'

C'PARALLEL OFF'

D'PARALLEL=OFF'

Attempts:

3 left

4fill in blank

hard

Fill both blanks to configure an EMRAddStepsOperator that adds a Spark step to the cluster.

Apache Airflow

add_spark_step = EMRAddStepsOperator(
    task_id='add_spark_step',
    job_flow_id=[1],
    aws_conn_id='aws_default',
    steps=[{
        'Name': 'Spark application',
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': ['spark-submit', [2]]
        }
    }]
)

Drag options to blanks, or click blank then click option'

A'j-2AXXXXXXGAPLF'

B'--deploy-mode',

C's3://my-bucket/scripts/spark_app.py'

D'--master'

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to define an S3CreateBucketOperator with region and ACL settings.

Apache Airflow

create_bucket = S3CreateBucketOperator(
    task_id='create_bucket',
    bucket_name=[1],
    region_name=[2],
    acl=[3],
    aws_conn_id='aws_default'
)

Drag options to blanks, or click blank then click option'

A'my-new-bucket-12345'

B'us-west-2'

C'private'

D'public-read'

Attempts:

3 left