0
0
Hadoopdata~10 mins

HDFS read and write operations in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - HDFS read and write operations
Start Write Request
Client contacts NameNode
NameNode returns DataNode list
Client writes data to DataNodes
DataNodes replicate blocks
Write Complete
Start Read Request
Client contacts NameNode
NameNode returns DataNode locations
Client reads data from DataNodes
Read Complete
Shows the flow of how data is written to and read from HDFS, involving client, NameNode, and DataNodes.
Execution Sample
Hadoop
1. Client requests to write file
2. NameNode provides DataNode list
3. Client streams data to DataNodes
4. DataNodes replicate blocks
5. Client requests to read file
6. NameNode provides DataNode locations
7. Client reads data from DataNodes
This sequence shows the main steps for writing and reading files in HDFS.
Execution Table
StepOperationActorAction DetailResult/Output
1Write RequestClientClient sends write request to NameNodeNameNode receives request
2Block AllocationNameNodeNameNode allocates blocks and returns DataNode listClient gets DataNode list
3Data StreamingClientClient streams data to first DataNode in pipelineDataNode receives data block
4ReplicationDataNodesDataNodes replicate data block to next DataNodesData replicated across DataNodes
5Write ConfirmationDataNodesDataNodes confirm write success to ClientClient confirms write complete
6Read RequestClientClient sends read request to NameNodeNameNode receives request
7Block LocationNameNodeNameNode returns DataNode locations for blocksClient gets DataNode locations
8Data ReadingClientClient reads data blocks from DataNodesData received by Client
9Read CompletionClientClient completes reading all blocksRead operation complete
💡 All blocks written and replicated successfully; all blocks read completely.
Variable Tracker
VariableStartAfter Step 2After Step 4After Step 7Final
Client RequestNoneWrite request sentStreaming data ongoingRead request sentRead complete
NameNode ResponseNoneDataNode list sentN/ADataNode locations sentN/A
DataNode StateEmptyReady to receiveData blocks replicatedServing read requestsIdle
Key Moments - 3 Insights
Why does the client contact the NameNode before writing or reading data?
The client contacts the NameNode to get the list or locations of DataNodes that store or will store the data blocks, as shown in steps 2 and 7 in the execution_table.
How does data replication happen during write operations?
Data is streamed from the client to the first DataNode, which then replicates the data to other DataNodes in a pipeline, as detailed in steps 3 and 4.
Why does the client read data directly from DataNodes instead of the NameNode?
The NameNode only manages metadata and block locations; actual data is stored on DataNodes, so the client reads data directly from them as shown in step 8.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step does the client receive the list of DataNodes for writing?
AStep 3
BStep 6
CStep 2
DStep 7
💡 Hint
Check the 'NameNode' actions in the execution_table rows 2 and 7.
During which step do DataNodes replicate the data blocks?
AStep 3
BStep 4
CStep 5
DStep 8
💡 Hint
Look at the 'Replication' operation performed by DataNodes in the execution_table.
If the client reads data directly from the NameNode, what would change in the execution flow?
ANameNode would send data blocks in step 7
BDataNodes would not replicate data in step 4
CClient would skip contacting DataNodes in step 8
DClient would not send read request in step 6
💡 Hint
Consider the role of NameNode and DataNodes in the execution_table steps 7 and 8.
Concept Snapshot
HDFS Write:
- Client asks NameNode for DataNode list
- Client streams data to DataNodes
- DataNodes replicate blocks

HDFS Read:
- Client asks NameNode for DataNode locations
- Client reads data from DataNodes

NameNode manages metadata; DataNodes store actual data blocks.
Full Transcript
This visual execution trace shows how HDFS handles read and write operations. When writing, the client first contacts the NameNode to get the list of DataNodes to store data blocks. Then the client streams data to the first DataNode, which replicates the data to other DataNodes. After replication, the write is confirmed complete. For reading, the client asks the NameNode for the locations of the data blocks, then reads the data directly from the DataNodes. The NameNode only manages metadata and block locations, while DataNodes store and serve the actual data. This flow ensures efficient and reliable data storage and retrieval in HDFS.