0
0
GCPcloud~10 mins

Dataproc for Spark/Hadoop in GCP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a Dataproc cluster with the default image version.

GCP
gcloud dataproc clusters create my-cluster --region=us-central1 --image-version=[1]
Drag options to blanks, or click blank then click option'
A2.0-debian10
B1.5-debian10
C1.3-ubuntu18
D3.0-centos7
Attempts:
3 left
💡 Hint
Common Mistakes
Using an outdated or unsupported image version.
Omitting the --image-version flag which defaults to the latest but is better to specify.
2fill in blank
medium

Complete the code to submit a Spark job to the Dataproc cluster.

GCP
gcloud dataproc jobs submit spark --cluster=my-cluster --region=us-central1 --class=[1] --jars=gs://my-bucket/my-job.jar -- my-args
Drag options to blanks, or click blank then click option'
Aorg.apache.hadoop.mapreduce.Job
Borg.apache.spark.examples.SparkPi
Ccom.google.cloud.dataproc.Main
Dorg.apache.hadoop.examples.WordCount
Attempts:
3 left
💡 Hint
Common Mistakes
Using a Hadoop class instead of a Spark class.
Using a non-existent or incorrect class name.
3fill in blank
hard

Fix the error in the command to create a Dataproc cluster with 3 worker nodes.

GCP
gcloud dataproc clusters create my-cluster --region=us-central1 --num-workers=[1]
Drag options to blanks, or click blank then click option'
A3-workers
Bworker-count=3
C--num-workers=3
D3
Attempts:
3 left
💡 Hint
Common Mistakes
Including the flag name in the value.
Using incorrect syntax like '3-workers'.
4fill in blank
hard

Fill both blanks to configure a Dataproc cluster with autoscaling enabled and specify the autoscaling policy.

GCP
gcloud dataproc clusters create my-cluster --region=us-central1 --enable-autoscaling --autoscaling-policy=[1] --num-workers=[2]
Drag options to blanks, or click blank then click option'
Amy-autoscale-policy
Bdefault-policy
C3
D5
Attempts:
3 left
💡 Hint
Common Mistakes
Confusing the autoscaling policy name with the number of workers.
Using default-policy when a custom policy is required.
5fill in blank
hard

Fill all three blanks to submit a Hadoop MapReduce job with a main class and specify the input and output paths.

GCP
gcloud dataproc jobs submit hadoop --cluster=my-cluster --region=us-central1 --class=[1] --jars=gs://my-bucket/my-hadoop-job.jar -- [2] [3]
Drag options to blanks, or click blank then click option'
Aorg.apache.hadoop.examples.WordCount
Bgs://my-bucket/input-data
Cgs://my-bucket/output-data
Dorg.apache.spark.examples.SparkPi
Attempts:
3 left
💡 Hint
Common Mistakes
Using a Spark class for a Hadoop job.
Swapping input and output paths.