0
0
Apache Sparkdata~10 mins

AWS EMR setup in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to specify the EMR cluster's release label.

Apache Spark
emr_cluster = client.run_job_flow(ReleaseLabel='[1]')
Drag options to blanks, or click blank then click option'
Aemr-5.30.0
Bemr-6.7.0
Cemr-3.11.0
Demr-4.0.0
Attempts:
3 left
💡 Hint
Common Mistakes
Using an outdated EMR release label that lacks Spark support.
Misspelling the release label string.
2fill in blank
medium

Complete the code to set the instance type for the master node in the EMR cluster.

Apache Spark
Instances={'MasterInstanceType': '[1]', 'InstanceCount': 3}
Drag options to blanks, or click blank then click option'
Am5.xlarge
Bt2.micro
Cc4.large
Dr3.2xlarge
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing an instance type too small for the master node.
Confusing master and core node instance types.
3fill in blank
hard

Fix the error in the bootstrap action configuration to install a custom package.

Apache Spark
BootstrapActions=[{'Name': 'InstallPackages', 'ScriptBootstrapAction': {'Path': '[1]'}}]
Drag options to blanks, or click blank then click option'
Ahttp://mybucket/scripts/install.sh
Bgs://mybucket/scripts/install.sh
C/local/scripts/install.sh
Ds3://mybucket/scripts/install.sh
Attempts:
3 left
💡 Hint
Common Mistakes
Using local file paths instead of S3 URLs.
Using incorrect cloud storage prefixes like gs:// (Google Cloud Storage).
4fill in blank
hard

Fill both blanks to configure the EMR cluster to use Spot instances for core nodes with a bid price.

Apache Spark
Instances={'CoreInstanceType': '[1]', 'CoreInstanceMarket': '[2]', 'CoreBidPrice': '0.10', 'InstanceCount': 5}
Drag options to blanks, or click blank then click option'
Am5.xlarge
BON_DEMAND
CSPOT
Dc5.large
Attempts:
3 left
💡 Hint
Common Mistakes
Setting CoreInstanceMarket to 'ON_DEMAND' when Spot is intended.
Choosing an instance type not supported for Spot.
5fill in blank
hard

Fill all three blanks to define an EMR step that runs a Spark application stored in S3.

Apache Spark
Steps=[{'Name': 'SparkApp', 'ActionOnFailure': '[1]', 'HadoopJarStep': {'Jar': '[2]', 'Args': ['spark-submit', '[3]']}}]
Drag options to blanks, or click blank then click option'
ACONTINUE
Bcommand-runner.jar
Cs3://mybucket/apps/my_spark_app.py
DTERMINATE_CLUSTER
Attempts:
3 left
💡 Hint
Common Mistakes
Using TERMINATE_CLUSTER which stops the cluster on failure.
Incorrect Jar name or missing Spark app path.