How Git stores objects - Performance & Efficiency
We want to understand how the time to store data in Git changes as the amount of data grows.
Specifically, how does Git handle saving many objects efficiently?
Analyze the time complexity of storing objects in Git using the following commands.
git hash-object -w <file>
git cat-file -p <hash>
git ls-files
This snippet shows how Git creates and stores an object, then reads it back and lists files.
Look for repeated steps that affect time as data grows.
- Primary operation: Writing and reading objects by their hash.
- How many times: Once per object stored or retrieved.
As the number of objects increases, Git stores each object separately by its hash.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 writes and reads |
| 100 | 100 writes and reads |
| 1000 | 1000 writes and reads |
Pattern observation: The time grows linearly with the number of objects stored or accessed.
Time Complexity: O(n)
This means storing or reading n objects takes time proportional to n, growing steadily as more objects are handled.
[X] Wrong: "Git stores all objects in one big file, so accessing one object is slow for many objects."
[OK] Correct: Git stores each object separately by its hash, so accessing one object does not require scanning all objects.
Understanding how Git stores objects helps you explain efficient data storage and retrieval, a useful skill in many software projects.
"What if Git used a single large file for all objects instead of separate files? How would the time complexity change?"