0
0
Hadoopdata~20 mins

Cluster planning and sizing in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Hadoop Cluster Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Key factor in determining Hadoop cluster size

Which of the following is the most important factor when deciding the size of a Hadoop cluster?

AThe total volume of data to be processed
BThe number of users accessing the cluster
CThe brand of hardware used
DThe color of the server racks
Attempts:
2 left
💡 Hint

Think about what directly impacts storage and processing needs.

data_output
intermediate
2:00remaining
Calculate total storage needed for a Hadoop cluster

You have 10 TB of raw data. Hadoop uses replication factor 3 by default. What is the total storage needed in the cluster?

A10 TB
B20 TB
C3 TB
D30 TB
Attempts:
2 left
💡 Hint

Remember Hadoop stores multiple copies of data for fault tolerance.

Predict Output
advanced
2:00remaining
Estimate number of nodes needed based on CPU cores

Given a Hadoop job requires 200 CPU cores and each node has 16 cores, how many nodes are needed?

Consider only whole nodes.

A13 nodes
B12 nodes
C14 nodes
D15 nodes
Attempts:
2 left
💡 Hint

Divide total cores needed by cores per node and round up.

visualization
advanced
2:00remaining
Visualize cluster storage distribution

You have a Hadoop cluster with 5 nodes. Each node has 4 TB storage. The replication factor is 3. How much usable storage is available in the cluster?

A20 TB usable storage
B60 TB usable storage
C6.67 TB usable storage
D15 TB usable storage
Attempts:
2 left
💡 Hint

Calculate total raw storage then divide by replication factor.

🚀 Application
expert
3:00remaining
Optimize cluster size for mixed workload

You manage a Hadoop cluster running both batch and real-time jobs. Batch jobs need high storage, real-time jobs need low latency and high CPU. Which cluster sizing strategy best balances these needs?

AUse small nodes with minimal storage and CPU, run all workloads on same nodes
BUse large nodes with high storage and many CPU cores, and separate batch and real-time workloads on different nodes
CUse nodes with only high storage, ignore CPU needs
DUse nodes with only high CPU, ignore storage needs
Attempts:
2 left
💡 Hint

Think about workload isolation and resource specialization.