Block storage and replication in Hadoop - Time & Space Complexity
When storing data in Hadoop, files are split into blocks and copied across machines.
We want to know how the time to store and replicate data grows as the file size grows.
Analyze the time complexity of the following Hadoop block storage and replication process.
# Assume file is split into blocks
for each block in file_blocks:
store block on DataNode1
replicate block to DataNode2
replicate block to DataNode3
This code splits a file into blocks and stores each block on one node, then replicates it twice to other nodes.
Look at what repeats as the file size changes.
- Primary operation: Loop over each block to store and replicate it.
- How many times: Once for every block in the file.
As the file size grows, the number of blocks grows, so the operations grow too.
| Input Size (blocks) | Approx. Operations |
|---|---|
| 10 | 30 (10 blocks x 3 stores/replications) |
| 100 | 300 |
| 1000 | 3000 |
Pattern observation: Operations increase directly with the number of blocks.
Time Complexity: O(n)
This means the time to store and replicate grows in a straight line with the number of blocks.
[X] Wrong: "Replication happens all at once, so time stays the same no matter file size."
[OK] Correct: Each block must be copied separately, so more blocks mean more work and more time.
Understanding how data replication time grows helps you explain system performance clearly and confidently.
"What if the replication factor changed from 3 to 5? How would the time complexity change?"