Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create an S3ToRedshiftOperator that copies data from S3 to Redshift.
Apache Airflow
copy_to_redshift = S3ToRedshiftOperator(
task_id='copy_s3_to_redshift',
redshift_conn_id='redshift_default',
aws_conn_id='aws_default',
schema='public',
table='users',
s3_bucket='my-bucket',
s3_key=[1],
copy_options=['csv']
) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using full s3 URI instead of just the key path.
Using an absolute file path starting with '/'.
✗ Incorrect
The s3_key parameter expects the path inside the bucket, not the full s3 URI.
2fill in blank
mediumComplete the code to define an EMRCreateJobFlowOperator with the correct job flow name.
Apache Airflow
create_emr_cluster = EMRCreateJobFlowOperator(
task_id='create_emr_cluster',
job_flow_overrides={
'Name': [1],
'ReleaseLabel': 'emr-6.3.0',
'Instances': {
'InstanceGroups': [
{
'Name': 'Master nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm5.xlarge',
'InstanceCount': 1
}
],
'KeepJobFlowAliveWhenNoSteps': True
}
}
) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Forgetting quotes around the cluster name.
Using a variable name without defining it.
✗ Incorrect
The 'Name' value must be a string literal, so it needs quotes.
3fill in blank
hardFix the error in the RedshiftToS3Operator by completing the parameter for unloading data.
Apache Airflow
unload_to_s3 = RedshiftToS3Operator(
task_id='unload_redshift_to_s3',
redshift_conn_id='redshift_default',
aws_conn_id='aws_default',
schema='public',
table='sales',
s3_bucket='my-bucket',
s3_key='exports/sales/',
unload_options=[[1]]
) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using lowercase or equals sign in unload options.
Missing quotes around the option string.
✗ Incorrect
Unload options are case sensitive and must be uppercase with spaces, not equals sign.
4fill in blank
hardFill both blanks to configure an EMRAddStepsOperator that adds a Spark step to the cluster.
Apache Airflow
add_spark_step = EMRAddStepsOperator(
task_id='add_spark_step',
job_flow_id=[1],
aws_conn_id='aws_default',
steps=[{
'Name': 'Spark application',
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': ['spark-submit', [2]]
}
}]
) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong format for job_flow_id.
Passing incorrect spark-submit arguments.
✗ Incorrect
job_flow_id is the cluster id string; the spark-submit argument is the script path in S3.
5fill in blank
hardFill all three blanks to define an S3CreateBucketOperator with region and ACL settings.
Apache Airflow
create_bucket = S3CreateBucketOperator(
task_id='create_bucket',
bucket_name=[1],
region_name=[2],
acl=[3],
aws_conn_id='aws_default'
) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using invalid region names.
Using ACL values not supported by S3.
✗ Incorrect
Bucket name is a string, region is a valid AWS region string, and ACL is usually 'private' or 'public-read'.