🗂️

How Does Git Work Internally?

Blob, Tree, Commit — The Three Git Objects

The myth that 'Git stores diffs' is wrong.

Git stores a full snapshot of every file at every commit. But since identical content produces the same hash, unchanged files aren't duplicated.

The three objects

Blob: File content. Identified by SHA-1 of the content, not the filename. Identical content is always one blob.

Tree: Directory. Maps filename -> blob hash, and subdirectory -> tree hash.

Commit: Top-level tree hash + parent commit hashes + author + message. That's all.

Branches are just pointers

.git/refs/heads/main is a text file with one commit hash in it. Creating a branch means creating one file. That's why Git branches are cheap.

HEAD is another pointer that tracks which branch you're currently on.

How It Works

1

`git add` -> creates blob from file content (.git/objects/)

2

`git commit` -> builds tree from index, then commit = tree + parent + message

3

Branch pointer (.git/refs/heads/xxx) updated to new commit hash

4

`git log` walks HEAD -> commit -> parent -> parent... chain backwards

5

`git cat-file -p <hash>` lets you inspect any object directly

Use Cases

Understanding why Git commands behave the way they do Recovering corrupted repositories (reflog, fsck) Writing custom tools (libgit2, rugged)