MyGit is a simplified reimplementation of Git’s core internals written in C.
The purpose of this project is to understand how Git works internally by building a basic content-addressable version control system from scratch.
It supports:
init— initialize repositoryadd— stage filescommit— create commitsstatus— show working directory staterevert— restore files to last commitlog— displays the commit historyrm— removes files from staging area
Git stores data using SHA-1 hashes instead of filenames.
When a file is added:
blob \0<file_content>
The SHA-1 hash of this data becomes the object’s identity.
Objects are stored inside:
.git/objects/<first_2_chars>/<remaining_38_chars>
If two files have identical content, they share the same hash.
SHA-1 generates a 40-character hash.
Any change in file content produces a new hash.
This ensures:
- Integrity
- Uniqueness
- Immutability
All objects are compressed before being stored to save space.
This matches Git’s internal storage behavior.
After running: ./mygit init
Structure: .git/ ├── HEAD ├── objects/ ├── refs/heads/master └── index
HEAD→ points to current branchobjects/→ stores compressed objectsrefs/heads/master→ stores latest commit hashindex→ staging area
./mygit add file.txt
Steps:
- Read file content
- Create blob object
- Compute SHA-1
- Compress and store object
- Record hash in index
./mygit commit "message"
Steps:
- Read staged files from index
- Create tree object
- Create commit object
- Store commit
- Update branch reference
Commit format:
tree <tree_sha> parent <parent_sha> author Name timestamp committer Name timestamp commit message
./mygit status
Shows the current state of the working directory.
Steps:
- Reads the index (
.git/index). - For each staged file:
- If the file is missing → marked as deleted.
- If its current hash differs from stored hash → marked as modified.
- Scans the current directory:
- Files not present in the index → marked as untracked.
This helps identify changes before committing.
./mygit revert
Restores working directory files to the state of the last commit.
Steps:
- Read latest commit SHA
- Read associated tree object
- Restore blob contents
- Overwrite working directory files
This behaves similarly to:
git restore . or git reset --hard HEAD
It does not delete commit history.
It restores files to match the last committed snapshot.
./mygit log
Displays the commit history starting from the latest commit (HEAD).
Steps:
- Reads the latest commit SHA from
.git/refs/heads/master. - Loads and decompresses the commit object.
- Displays:
- Commit SHA
- Author
- Date
- Commit message
- Reads the parent commit (if exists).
- Repeats until no parent is found.
This recreates Git’s commit history traversal using parent-linked commits.
./mygit rm
Removes a file from both:
- Working directory
- Staging area (index)
Steps:
- Deletes the file from the current directory.
- Removes the file’s entry from
.git/index. - Updates the index file.
Requires:
- GCC
- OpenSSL
- zlib
Compile with:
gcc src/*.c -Iinclude -lssl -lcrypto -lz -o mygit
./mygit init echo "hello" > file.txt ./mygit add file.txt ./mygit commit "first commit" ./mygit status ./mygit revert
- Systems programming in C
- File system operations
- Cryptographic hashing
- Data compression
- Git’s internal storage model
- Basic repository state management
MyGit is a simplified educational implementation of Git’s core storage engine.
It demonstrates how Git stores objects, creates commits, and manages repository history at a low level.