0
0
Hadoopdata~10 mins

Hadoop in cloud (EMR, Dataproc, HDInsight) - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to launch an EMR cluster using AWS CLI.

Hadoop
aws emr create-cluster --name MyCluster --release-label emr-6.3.0 --applications Name=Hadoop [1]
Drag options to blanks, or click blank then click option'
A--region us-west-2
B--instance-count 3
C--instance-type m5.xlarge
D--log-uri s3://my-logs/
Attempts:
3 left
💡 Hint
Common Mistakes
Using --instance-type instead of --instance-count
Forgetting to specify the number of instances
2fill in blank
medium

Complete the code to submit a Hadoop job on Google Dataproc.

Hadoop
gcloud dataproc jobs submit hadoop --cluster my-cluster --region us-central1 --jar myjob.jar [1]
Drag options to blanks, or click blank then click option'
A--zone us-central1-a
B--num-workers 5
C--class com.example.MyJob
D--project my-project
Attempts:
3 left
💡 Hint
Common Mistakes
Using --num-workers which is for cluster size, not job submission
Omitting the main class option
3fill in blank
hard

Fix the error in the HDInsight script to create a Hadoop cluster with 4 worker nodes.

Hadoop
az hdinsight create --name mycluster --resource-group mygroup --type Hadoop --location eastus [1]
Drag options to blanks, or click blank then click option'
A--worker-node-count 4
B--worker-node-size Standard_D3_v2
C--head-node-size Standard_D3_v2
D--cluster-tier Standard
Attempts:
3 left
💡 Hint
Common Mistakes
Using --worker-node-size which sets size, not count
Forgetting to specify worker node count
4fill in blank
hard

Fill both blanks to configure a Dataproc cluster with 3 master nodes and 5 worker nodes.

Hadoop
gcloud dataproc clusters create my-cluster --region us-central1 --num-masters=[1] --num-workers=[2]
Drag options to blanks, or click blank then click option'
A3
B5
C2
D4
Attempts:
3 left
💡 Hint
Common Mistakes
Swapping master and worker counts
Using incorrect numbers for nodes
5fill in blank
hard

Fill all three blanks to create an EMR cluster with Hadoop, specify instance type, and enable debugging.

Hadoop
aws emr create-cluster --name TestCluster --release-label emr-6.3.0 --applications Name=[1] --instance-type [2] --enable-debugging [3]
Drag options to blanks, or click blank then click option'
AHadoop
Bm5.xlarge
C--log-uri s3://my-emr-logs/
DSpark
Attempts:
3 left
💡 Hint
Common Mistakes
Using Spark instead of Hadoop for the application
Omitting the log URI for debugging
Using wrong instance type