Challenge - 5 Problems

🎖️

Input Splits and Data Locality Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Input Splits in Hadoop

In Hadoop MapReduce, what is the primary purpose of input splits?

ATo store the output data after processing

BTo compress data before sending it to reducers

CTo divide the input data into manageable chunks for parallel processing

DTo replicate data across nodes for fault tolerance

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

Data Locality Importance

Why is data locality important in Hadoop's processing model?

AIt ensures data is encrypted during processing

BIt reduces network traffic by processing data on the node where it is stored

CIt balances the load evenly across all nodes regardless of data location

DIt compresses data to save storage space

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Input Split Sizes Calculation

Given a 640 MB file stored in HDFS with a block size of 128 MB, how many input splits will Hadoop create by default?

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

Effect of Custom InputFormat on Splits

Consider this Hadoop MapReduce job snippet using a custom InputFormat that combines small files into larger splits:

job.setInputFormatClass(CombineFileInputFormat.class);

If the input directory contains 10 files of 10 MB each and the block size is 128 MB, how many input splits will be created?

B10

Attempts:

2 left

🚀 Application

expert

3:00remaining

Optimizing Data Locality in a Hadoop Cluster

You have a Hadoop cluster with 5 nodes. A large dataset is stored unevenly: 80% on node 1, and the rest spread across nodes 2-5. You want to maximize data locality for a MapReduce job. Which strategy will best improve data locality?

ADistribute the dataset evenly across all nodes before running the job

BRun all mappers only on node 1 to process the majority of data

CIncrease the number of reducers to balance the load

DCompress the dataset to reduce its size

Attempts:

2 left