0
0
Gitdevops~15 mins

The .git directory structure - Deep Dive

Choose your learning style9 modes available
Overview - The .git directory structure
What is it?
The .git directory is a hidden folder inside a Git project that stores all the information Git needs to track changes and manage versions. It contains data about commits, branches, configuration, and more. This folder makes your project a Git repository, enabling version control features. Without it, Git cannot track or save your project's history.
Why it matters
Without the .git directory, Git would have no way to remember your project's history or changes. This means you couldn't undo mistakes, collaborate safely, or keep track of who changed what and when. The .git directory is like the brain of Git, storing all the knowledge about your project’s evolution. Losing it means losing your version control.
Where it fits
Before learning about the .git directory, you should understand basic Git commands like git init, git add, and git commit. After this, you can explore advanced Git topics like branching, merging, and rebasing, which rely on the data stored inside the .git directory.
Mental Model
Core Idea
The .git directory is the hidden control center where Git stores all the data and metadata needed to manage your project's history and state.
Think of it like...
The .git directory is like a secret filing cabinet in your room that holds every draft, note, and change you ever made to a project, so you can always go back and see or restore any version.
┌─────────────────────────────┐
│          .git/              │
├─────────────┬───────────────┤
│ config      │ Repository    │
│             │ settings      │
├─────────────┼───────────────┤
│ HEAD        │ Pointer to    │
│             │ current branch│
├─────────────┼───────────────┤
│ objects/    │ Stores all    │
│             │ file data &   │
│             │ commits       │
├─────────────┼───────────────┤
│ refs/       │ Branch and    │
│             │ tag pointers  │
├─────────────┼───────────────┤
│ logs/       │ History of    │
│             │ changes to    │
│             │ refs          │
└─────────────┴───────────────┘
Build-Up - 7 Steps
1
FoundationWhat is the .git directory
🤔
Concept: Introducing the .git directory as the core folder that makes a project a Git repository.
When you run 'git init' in a folder, Git creates a hidden folder named '.git'. This folder contains everything Git needs to track your project. It is invisible in normal file views because it starts with a dot, which means 'hidden' on many systems.
Result
The folder becomes a Git repository, ready to track changes.
Understanding that the .git directory is the heart of Git helps you realize that your project’s version control depends entirely on this hidden folder.
2
FoundationKey files inside .git directory
🤔
Concept: Learn about the main files and folders inside .git and their roles.
Inside .git, you find files like 'config' (settings), 'HEAD' (current branch pointer), and folders like 'objects' (stores all data), 'refs' (branches and tags), and 'logs' (history of changes). Each part has a specific job to keep track of your project’s state.
Result
You can identify where Git stores configuration, data, and pointers.
Knowing the purpose of these files helps you understand how Git organizes and retrieves your project’s history.
3
IntermediateUnderstanding the objects folder
🤔Before reading on: do you think the objects folder stores files as they are or in a special format? Commit to your answer.
Concept: The objects folder stores all data in a compressed and hashed format to ensure integrity and efficiency.
Git stores every file version and commit as an object inside the 'objects' folder. These objects are compressed and named by a hash (a unique code). This makes storage efficient and secure, preventing accidental changes.
Result
All project data is safely stored and can be retrieved by Git using hashes.
Understanding that Git uses hashes and compression explains why Git is fast and reliable even with large histories.
4
IntermediateRole of HEAD and refs folders
🤔Before reading on: does HEAD point to a commit or a branch? Commit to your answer.
Concept: HEAD is a pointer to the current branch, and refs store pointers to branches and tags.
The 'HEAD' file tells Git which branch you are working on. The 'refs' folder contains pointers to all branches and tags, which themselves point to specific commits. This system lets Git know your current position and available branches.
Result
Git knows your current working branch and can switch between branches easily.
Knowing how HEAD and refs work together clarifies how Git tracks your current work and manages multiple lines of development.
5
IntermediatePurpose of logs folder
🤔
Concept: The logs folder keeps a history of changes to references like branches.
Inside '.git/logs', Git records every update to branches and HEAD. This log helps Git recover lost commits and provides a safety net for undoing changes.
Result
You can recover from mistakes using reflog because Git tracks changes to branch pointers.
Understanding the logs folder reveals how Git can restore lost work even after complex operations.
6
AdvancedHow Git stores commits internally
🤔Before reading on: do you think a commit stores the whole project or just changes? Commit to your answer.
Concept: Git stores commits as snapshots of the entire project state, not just differences.
Each commit object points to a tree object representing the project files at that moment. Trees point to blobs (file contents). This snapshot model means Git can quickly restore any commit without replaying changes.
Result
Git efficiently manages project history as snapshots, enabling fast operations.
Knowing Git’s snapshot model explains why branching and switching are so fast compared to other version control systems.
7
ExpertPacked objects and garbage collection
🤔Before reading on: do you think Git stores all objects separately forever? Commit to your answer.
Concept: Git packs many objects into compressed packfiles to save space and speed up access, and removes unreachable objects via garbage collection.
Over time, Git combines loose objects into packfiles inside '.git/objects/pack'. This reduces disk space and speeds up operations. Git also runs garbage collection to delete objects no longer referenced by any branch or tag.
Result
Git repositories stay efficient and clean even with long histories.
Understanding packing and garbage collection reveals how Git scales to huge projects without slowing down.
Under the Hood
The .git directory contains a structured database where Git stores all project data as objects identified by SHA-1 hashes. Commits point to trees, which point to blobs (file contents). Branches and tags are references pointing to commits. HEAD points to the current branch. Git uses this structure to quickly find any version of any file. It compresses and packs objects to optimize storage and uses logs to track changes to references for recovery.
Why designed this way?
Git was designed by Linus Torvalds to be fast, reliable, and distributed. Using a content-addressable storage with hashes ensures data integrity and easy sharing. Storing snapshots instead of diffs simplifies branching and merging. Packing objects and keeping logs enable efficient storage and recovery. Alternatives like centralized version control lacked these benefits, so Git’s design was revolutionary.
┌───────────────┐
│   .git/      │
├───────────────┤
│ config       │
│ HEAD ───────▶│
│ refs/        │
│  ├─ heads/   │
│  │   └─ master ──┐
│  └─ tags/    │   │
│ objects/     │   │
│  ├─ loose    │   │
│  └─ pack/    │   │
│ logs/        │   │
└───────────────┘   │
                    ▼
               ┌─────────┐
               │ commit  │
               ├─────────┤
               │ tree    │
               └─────────┘
                    │
                    ▼
               ┌─────────┐
               │ blobs   │
               └─────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does deleting the .git directory delete your project files? Commit yes or no.
Common Belief:Deleting the .git directory deletes all my project files too.
Tap to reveal reality
Reality:Deleting .git only removes Git’s tracking data; your project files remain untouched but untracked.
Why it matters:Deleting .git by mistake can cause loss of version history, making it impossible to revert changes or collaborate, even though files remain.
Quick: Does the .git directory contain your working files in normal form? Commit yes or no.
Common Belief:The .git directory stores my project files exactly as I see them.
Tap to reveal reality
Reality:The .git directory stores compressed, hashed objects, not normal files; your working files are outside it.
Why it matters:Misunderstanding this can lead to confusion about where changes are saved and how Git manages data internally.
Quick: Does HEAD always point directly to a commit? Commit yes or no.
Common Belief:HEAD always points directly to a commit object.
Tap to reveal reality
Reality:HEAD usually points to a branch reference, which then points to a commit; sometimes it points directly to a commit in detached HEAD state.
Why it matters:Knowing this prevents confusion when switching branches or in detached HEAD situations, avoiding lost commits.
Quick: Are all objects in .git stored separately forever? Commit yes or no.
Common Belief:Git stores every object as a separate file forever.
Tap to reveal reality
Reality:Git packs many objects into packfiles to save space and improve speed, removing loose objects over time.
Why it matters:Ignoring packing can lead to misunderstanding repository size and performance behavior.
Expert Zone
1
The 'index' file inside .git is a binary cache of the staging area, speeding up status and commit operations.
2
Git’s use of SHA-1 hashes not only ensures data integrity but also enables distributed collaboration without conflicts.
3
The reflog stored in logs allows recovery from nearly any mistake, even after branch deletion or reset.
When NOT to use
Directly manipulating files inside .git is risky and discouraged; use Git commands instead. For very large repositories, consider Git LFS or alternative version control systems designed for big binary files.
Production Patterns
In production, teams often back up the .git directory to preserve history, use hooks inside .git/hooks for automation, and inspect .git/objects and refs to debug complex issues or recover lost commits.
Connections
Database indexing
Both use pointers and hashes to quickly locate data.
Understanding how Git uses hashes and references is similar to how databases use indexes to find records efficiently.
File system journaling
Git’s logs folder acts like a journal recording changes to references.
Knowing journaling in file systems helps understand how Git tracks changes to branches and recovers from errors.
Human memory and filing systems
Git’s .git directory organizes project history like a filing cabinet organizes documents.
Recognizing this connection helps appreciate the importance of structured storage for easy retrieval and recovery.
Common Pitfalls
#1Deleting the .git directory to clean up project space.
Wrong approach:rm -rf .git
Correct approach:Use 'git clean' or remove unwanted files, but keep .git to preserve history.
Root cause:Misunderstanding that .git is just hidden files, not realizing it stores all version control data.
#2Manually editing files inside .git to fix issues.
Wrong approach:Editing .git/HEAD or .git/refs/heads/master with a text editor.
Correct approach:Use Git commands like 'git reset' or 'git checkout' to safely change references.
Root cause:Lack of knowledge about Git’s internal structure and safe command usage.
#3Ignoring .git/objects size growth leading to slow performance.
Wrong approach:Never running 'git gc' or 'git repack' on large repositories.
Correct approach:Regularly run 'git gc' to pack objects and optimize repository size.
Root cause:Not understanding Git’s object storage and maintenance needs.
Key Takeaways
The .git directory is the hidden core of every Git repository, storing all data and metadata needed for version control.
Git organizes project history using objects, references, and logs inside .git to enable fast, reliable operations.
Understanding the structure of .git helps you troubleshoot, recover lost work, and appreciate Git’s design.
Never manually edit .git files; always use Git commands to interact safely with the repository.
Git’s internal mechanisms like packing and reflog ensure efficient storage and powerful recovery options.