Complete the code to create a Dataproc cluster with the default image version.
gcloud dataproc clusters create my-cluster --region=us-central1 --image-version=[1]The default and recommended Dataproc image version is 2.0-debian10 which supports the latest features.
Complete the code to submit a Spark job to the Dataproc cluster.
gcloud dataproc jobs submit spark --cluster=my-cluster --region=us-central1 --class=[1] --jars=gs://my-bucket/my-job.jar -- my-args
The SparkPi example class org.apache.spark.examples.SparkPi is commonly used to test Spark jobs on Dataproc.
Fix the error in the command to create a Dataproc cluster with 3 worker nodes.
gcloud dataproc clusters create my-cluster --region=us-central1 --num-workers=[1]The correct flag to specify the number of worker nodes is --num-workers followed by the number without extra syntax.
Fill both blanks to configure a Dataproc cluster with autoscaling enabled and specify the autoscaling policy.
gcloud dataproc clusters create my-cluster --region=us-central1 --enable-autoscaling --autoscaling-policy=[1] --num-workers=[2]
Use your autoscaling policy name (e.g., my-autoscale-policy) and set the initial number of workers (e.g., 3).
Fill all three blanks to submit a Hadoop MapReduce job with a main class and specify the input and output paths.
gcloud dataproc jobs submit hadoop --cluster=my-cluster --region=us-central1 --class=[1] --jars=gs://my-bucket/my-hadoop-job.jar -- [2] [3]
The Hadoop job uses the WordCount class, with input and output paths specified as GCS URIs.