SHA-1 hashing concept in Git - Time & Space Complexity
We want to understand how the time to create a SHA-1 hash changes as the size of the data grows.
How does hashing more data affect the work git does?
Analyze the time complexity of this git hashing command.
git hash-object <file>
# Internally, git reads the file content
# Then computes the SHA-1 hash of the content
# Finally, it stores the object in the git database
This command creates a SHA-1 hash for a file's content to track it in git.
Look for repeated work done during hashing.
- Primary operation: Reading each byte of the file once to process it for hashing.
- How many times: Once per byte, so the number of bytes in the file.
As the file size grows, the hashing work grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 bytes | About 10 operations |
| 100 bytes | About 100 operations |
| 1000 bytes | About 1000 operations |
Pattern observation: The work grows directly with the file size.
Time Complexity: O(n)
This means the time to hash grows in a straight line with the size of the data.
[X] Wrong: "Hashing always takes the same time no matter the file size."
[OK] Correct: Hashing must look at every byte, so bigger files take more time.
Understanding how hashing time grows helps you explain performance in version control and data integrity tasks.
"What if git used a different hash algorithm that processes data in fixed-size chunks? How would that affect the time complexity?"