0
0
Hadoopdata~20 mins

Hadoop in cloud (EMR, Dataproc, HDInsight) - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Cloud Hadoop Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Hadoop Cluster Setup on Cloud

Which cloud service among EMR, Dataproc, and HDInsight automatically handles Hadoop cluster provisioning and scaling with minimal manual configuration?

AGoogle Dataproc
BAmazon EMR
CAzure HDInsight
DAll require manual cluster setup and scaling
Attempts:
2 left
💡 Hint

Think about which service is designed for quick cluster startup and auto-scaling with minimal user input.

Predict Output
intermediate
2:00remaining
Output of Hadoop Job Status Command on EMR

What is the output of the following AWS CLI command when checking the status of a running Hadoop job on EMR?

aws emr describe-step --cluster-id j-2AXXXXXXGAPLF --step-id s-3XXXXXXXXX
Hadoop
aws emr describe-step --cluster-id j-2AXXXXXXGAPLF --step-id s-3XXXXXXXXX
A{"Step":{"Status":{"State":"RUNNING"}}}
B{"Step":{"Status":{"State":"FAILED"}}}
C{"Error":"Cluster or Step ID not found"}
D{"Step":{"Status":{"State":"COMPLETED"}}}
Attempts:
2 left
💡 Hint

Consider the command is run while the job is actively processing.

data_output
advanced
2:00remaining
Comparing Storage Options in Cloud Hadoop Services

Given a Hadoop job running on EMR, Dataproc, and HDInsight, which storage system is natively integrated and used by each service for storing input and output data?

AEMR uses HDFS, Dataproc uses HDFS, HDInsight uses HDFS only
BEMR uses Azure Blob Storage, Dataproc uses Amazon S3, HDInsight uses Google Cloud Storage
CEMR uses Amazon S3, Dataproc uses Google Cloud Storage, HDInsight uses Azure Blob Storage
DAll use local disk storage only
Attempts:
2 left
💡 Hint

Think about the native cloud storage services each cloud provider offers.

🔧 Debug
advanced
2:00remaining
Identifying Cause of Hadoop Job Failure on HDInsight

A Hadoop job submitted to Azure HDInsight fails immediately with the error message: "java.lang.ClassNotFoundException: org.apache.hadoop.fs.azure.NativeAzureFileSystem". What is the most likely cause?

ANetwork connectivity issues between nodes
BIncorrect Hadoop version installed on the cluster
CInsufficient cluster memory for the job
DMissing Azure Blob Storage connector JAR in the classpath
Attempts:
2 left
💡 Hint

ClassNotFoundException usually means a required library is missing.

🚀 Application
expert
3:00remaining
Optimizing Hadoop Job Cost on Cloud Services

You want to minimize cost while running a large Hadoop job on cloud services. Which strategy is best to reduce cost without sacrificing job completion?

AUse only on-demand instances with maximum cluster size for speed
BUse spot/preemptible instances with checkpointing and automatic retries
CRun the job on a single large instance to avoid cluster overhead
DDisable autoscaling and keep cluster running 24/7
Attempts:
2 left
💡 Hint

Consider how cloud providers offer cheaper compute options that can be interrupted.