Rack awareness helps Hadoop store data smartly across different racks. This keeps data safe and fast to access.
Rack awareness in HDFS in Hadoop
class RackAwarePlacementPolicy { // Map of DataNode to its rack Map<DataNode, String> nodeToRackMap = new HashMap<>(); // Method to get rack for a DataNode String getRack(DataNode node) { return nodeToRackMap.get(node); } // Method to choose DataNodes for block placement List<DataNode> chooseDataNodes(int numReplicas) { // Logic to pick nodes from different racks } }
This is a simplified view of how rack awareness is implemented in Hadoop.
Hadoop uses a network topology script to map nodes to racks automatically.
RackAwarePlacementPolicy policy = new RackAwarePlacementPolicy(); policy.nodeToRackMap.put(node1, "/rack1"); policy.nodeToRackMap.put(node2, "/rack2"); List<DataNode> chosenNodes = policy.chooseDataNodes(2);
// Edge case: Empty cluster RackAwarePlacementPolicy emptyPolicy = new RackAwarePlacementPolicy(); List<DataNode> chosenNodes = emptyPolicy.chooseDataNodes(3); // returns empty list
// Edge case: Only one rack available policy.nodeToRackMap.clear(); policy.nodeToRackMap.put(node1, "/rack1"); policy.nodeToRackMap.put(node2, "/rack1"); List<DataNode> chosenNodes = policy.chooseDataNodes(2); // both nodes from same rack
This program creates three data nodes on two racks. It then chooses two nodes for storing replicas, preferring different racks.
import java.util.*; class DataNode { String name; DataNode(String name) { this.name = name; } public String toString() { return name; } } class RackAwarePlacementPolicy { Map<DataNode, String> nodeToRackMap = new HashMap<>(); String getRack(DataNode node) { return nodeToRackMap.get(node); } List<DataNode> chooseDataNodes(int numReplicas) { List<DataNode> chosenNodes = new ArrayList<>(); Set<String> racksUsed = new HashSet<>(); for (DataNode node : nodeToRackMap.keySet()) { String rack = getRack(node); if (!racksUsed.contains(rack)) { chosenNodes.add(node); racksUsed.add(rack); if (chosenNodes.size() == numReplicas) { break; } } } // If not enough racks, fill with nodes from racks already used if (chosenNodes.size() < numReplicas) { for (DataNode node : nodeToRackMap.keySet()) { if (!chosenNodes.contains(node)) { chosenNodes.add(node); if (chosenNodes.size() == numReplicas) { break; } } } } return chosenNodes; } } public class RackAwarenessDemo { public static void main(String[] args) { DataNode node1 = new DataNode("Node1"); DataNode node2 = new DataNode("Node2"); DataNode node3 = new DataNode("Node3"); RackAwarePlacementPolicy policy = new RackAwarePlacementPolicy(); policy.nodeToRackMap.put(node1, "/rack1"); policy.nodeToRackMap.put(node2, "/rack2"); policy.nodeToRackMap.put(node3, "/rack1"); System.out.println("Before choosing nodes:"); for (var entry : policy.nodeToRackMap.entrySet()) { System.out.println(entry.getKey() + " on " + entry.getValue()); } List<DataNode> chosenNodes = policy.chooseDataNodes(2); System.out.println("\nChosen DataNodes for replicas:"); for (DataNode node : chosenNodes) { System.out.println(node + " on " + policy.getRack(node)); } } }
Time complexity: O(n) where n is number of data nodes, because it scans nodes to pick racks.
Space complexity: O(n) for storing node to rack mapping.
Common mistake: Not handling the case when fewer racks than replicas exist, causing all replicas on same rack.
Use rack awareness to improve fault tolerance and network efficiency compared to random placement.
Rack awareness places data copies on different racks to protect against rack failures.
It improves data availability and network usage in Hadoop clusters.
When racks are limited, replicas may share racks but still try to spread out.