Rack awareness helps HDFS decide where to place data blocks. What is the main benefit of using rack awareness?
Think about what happens if a whole rack goes down.
Rack awareness places replicas on different racks to protect against rack failure and reduce cross-rack network traffic.
When HDFS writes a block, how does rack awareness influence where replicas are stored?
Think about balancing fault tolerance and network traffic.
HDFS places one replica locally, one on the same rack, and one on a different rack to balance reliability and network efficiency.
Given the following simplified Python simulation of HDFS rack-aware block placement, what is the output?
nodes = {'rack1': ['node1', 'node2'], 'rack2': ['node3', 'node4']}
import random
# Place one replica on node1
replicas = ['node1']
# Place second replica on different node in same rack
same_rack_nodes = [n for n in nodes['rack1'] if n != 'node1']
replicas.append(random.choice(same_rack_nodes))
# Place third replica on a node in a different rack
other_rack = 'rack2'
replicas.append(random.choice(nodes[other_rack]))
print(sorted(replicas))Remember the second replica is on the same rack but different node, third on different rack.
The code places replicas on node1, another node in rack1 (node2), and one node in rack2 (node3 or node4). The output is sorted so node1 and node2 appear first, then node3 or node4.
What error will this Python code raise when simulating rack-aware replica placement?
nodes = {'rack1': ['node1', 'node2'], 'rack2': ['node3', 'node4']}
replicas = []
# Place one replica on node1
replicas.append('node1')
# Place second replica on different node in same rack
same_rack_nodes = [n for n in nodes['rack1'] if n != 'node1']
replicas.append(same_rack_nodes[1])
# Place third replica on a node in a different rack
other_rack = 'rack2'
replicas.append(nodes[other_rack][0])
print(replicas)Check the length of same_rack_nodes before accessing index 2.
same_rack_nodes contains only one node ('node2'), so accessing index 2 causes IndexError.
Suppose an HDFS cluster has 3 racks with 4 nodes each. Without rack awareness, replicas are placed randomly. With rack awareness, replicas are placed as per the standard policy (one local, one same rack, one different rack). Which option best describes the expected impact on cross-rack network traffic during data reads?
Think about how placing replicas on the same rack affects network usage.
By placing one replica locally and another on the same rack, rack awareness reduces the need to fetch data across racks, lowering cross-rack network traffic.