0
0
Hadoopdata~20 mins

Rack awareness in HDFS in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Rack Awareness Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Why is rack awareness important in HDFS?

Rack awareness helps HDFS decide where to place data blocks. What is the main benefit of using rack awareness?

AIt reduces network traffic by placing replicas on different racks to avoid data loss if a rack fails.
BIt speeds up data processing by storing all replicas on the same rack for faster access.
CIt simplifies the file system by ignoring network topology during data placement.
DIt increases storage capacity by duplicating data on the same node multiple times.
Attempts:
2 left
💡 Hint

Think about what happens if a whole rack goes down.

🧠 Conceptual
intermediate
1:30remaining
How does HDFS use rack awareness during block placement?

When HDFS writes a block, how does rack awareness influence where replicas are stored?

AAll replicas are stored on the same node to maximize speed.
BOne replica is stored on the local node, another on a different node in the same rack, and the third on a node in a different rack.
CReplicas are randomly placed without considering racks.
DTwo replicas are stored on the same rack and one on a different data center.
Attempts:
2 left
💡 Hint

Think about balancing fault tolerance and network traffic.

data_output
advanced
2:00remaining
Output of rack-aware block placement simulation

Given the following simplified Python simulation of HDFS rack-aware block placement, what is the output?

Hadoop
nodes = {'rack1': ['node1', 'node2'], 'rack2': ['node3', 'node4']}

import random

# Place one replica on node1
replicas = ['node1']

# Place second replica on different node in same rack
same_rack_nodes = [n for n in nodes['rack1'] if n != 'node1']
replicas.append(random.choice(same_rack_nodes))

# Place third replica on a node in a different rack
other_rack = 'rack2'
replicas.append(random.choice(nodes[other_rack]))

print(sorted(replicas))
A['node2', 'node3', 'node4']
B['node1', 'node2', 'node1']
C['node1', 'node2', 'node3'] or ['node1', 'node2', 'node4']
D['node1', 'node3', 'node4']
Attempts:
2 left
💡 Hint

Remember the second replica is on the same rack but different node, third on different rack.

🔧 Debug
advanced
1:30remaining
Identify the error in rack-aware replica placement code

What error will this Python code raise when simulating rack-aware replica placement?

nodes = {'rack1': ['node1', 'node2'], 'rack2': ['node3', 'node4']}

replicas = []

# Place one replica on node1
replicas.append('node1')

# Place second replica on different node in same rack
same_rack_nodes = [n for n in nodes['rack1'] if n != 'node1']
replicas.append(same_rack_nodes[1])

# Place third replica on a node in a different rack
other_rack = 'rack2'
replicas.append(nodes[other_rack][0])

print(replicas)
ANo error, code runs and prints the replica list.
BKeyError because 'rack3' does not exist in nodes dictionary.
CTypeError because replicas list cannot append strings.
DIndexError because same_rack_nodes has only 1 element but index 2 is accessed.
Attempts:
2 left
💡 Hint

Check the length of same_rack_nodes before accessing index 2.

🚀 Application
expert
2:30remaining
Analyzing network traffic reduction using rack awareness

Suppose an HDFS cluster has 3 racks with 4 nodes each. Without rack awareness, replicas are placed randomly. With rack awareness, replicas are placed as per the standard policy (one local, one same rack, one different rack). Which option best describes the expected impact on cross-rack network traffic during data reads?

ARack awareness reduces cross-rack traffic by ensuring at least two replicas are on the same rack, minimizing data transfer across racks.
BRack awareness has no effect on network traffic since data is always read from the closest node regardless of rack.
CRack awareness increases cross-rack traffic because replicas are spread across all racks evenly.
DRack awareness causes all replicas to be on the same rack, eliminating cross-rack traffic completely.
Attempts:
2 left
💡 Hint

Think about how placing replicas on the same rack affects network usage.