NameNode and DataNode work together to store and manage big data safely and efficiently.
NameNode and DataNode roles in Hadoop
NameNode: Manages metadata and file system namespace. DataNode: Stores actual data blocks and handles read/write requests.
NameNode keeps track of where data is stored but does not store the data itself.
DataNodes store the real data and report back to the NameNode regularly.
NameNode: Keeps a list of all files and their block locations. DataNode: Stores blocks of files on local disks.
NameNode: Handles client requests for file operations. DataNode: Sends heartbeat signals to NameNode to confirm it is alive.
This simple code shows how NameNode stores metadata about file blocks and DataNodes store the actual data blocks. The client asks NameNode for block locations and then reads data from DataNodes.
# This is a conceptual example in Python to simulate NameNode and DataNode roles class NameNode: def __init__(self): self.metadata = {} def add_file(self, filename, blocks): self.metadata[filename] = blocks def get_file_blocks(self, filename): return self.metadata.get(filename, []) class DataNode: def __init__(self, node_id): self.node_id = node_id self.blocks = {} def store_block(self, block_id, data): self.blocks[block_id] = data def read_block(self, block_id): return self.blocks.get(block_id, None) # Setup name_node = NameNode() data_node1 = DataNode('DN1') data_node2 = DataNode('DN2') # Store blocks on DataNodes data_node1.store_block('block1', 'Data of block 1') data_node2.store_block('block2', 'Data of block 2') # NameNode keeps metadata name_node.add_file('file1.txt', ['block1', 'block2']) # Client asks NameNode where blocks are blocks = name_node.get_file_blocks('file1.txt') print(f"Blocks for file1.txt: {blocks}") # Client reads data from DataNodes for block in blocks: if block == 'block1': print(f"Reading {block} from DN1: {data_node1.read_block(block)}") elif block == 'block2': print(f"Reading {block} from DN2: {data_node2.read_block(block)}")
NameNode is a single point of failure in basic Hadoop setups, so high availability setups are important.
DataNodes send regular heartbeats to NameNode to show they are working.
Data is split into blocks and distributed to many DataNodes for reliability and speed.
NameNode manages metadata and controls the file system.
DataNodes store the actual data blocks and handle read/write operations.
They work together to store big data safely and efficiently in Hadoop.