PyGit is a from-scratch implementation of core Git internals written in Python.
The objective of this project is to understand how Git works under the hood, including
content-addressable storage, object hashing, compression, staging, and tree construction.
This project does not use Git libraries or shell commands internally. All logic is implemented manually for learning and experimentation.
- Content-addressable object storage
- Git-compatible SHA-1 hashing
(<type> <size>\0<content>) - zlib-based object compression
- Blob objects for file contents
- Tree objects for directory representation
- Index (staging area) stored on disk
- Recursive directory traversal
- Command-line interface (CLI)
initadd(file or directory)commit(basic structure)
- File contents are read as raw bytes
- A Git-style header is prepended
blob <size>\0 - SHA-1 hash is computed over header + content
- Object data is compressed using zlib
- Object is stored in a content-addressable layout
- The index maps file paths to blob hashes
- Trees are constructed recursively from the index
- Python 3.9 or higher
- No third-party dependencies
python main.py initInternal repository directory created
Object storage and references initialized
python main.py add test1.txtFile content hashed
Blob object created and stored
Index updated
python main.py add .Recursively stages all files except internal repository data.
python main.py commit -m "First commit"Creates a tree object from the staging index and prepares commit metadata.
Commit implementation is minimal and educational.
The staging index is stored as JSON and maps file paths to blob hashes:
{
"test1.txt": "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3"
}Objects are stored using the same logical format as Git:
zlib(
"<type> <size>\\0<content>"
)Each object is addressed by its SHA-1 hash.
This project demonstrates understanding of:
Git internal object model
Content-addressable storage
Hash-based immutability
Filesystem-backed data storage
Recursive tree construction
CLI design using argparse
Binary data handling in Python
Full commit object support
Parent commit chaining
Branch management
Diff and status implementation
Compatibility validation with real Git
Raj Kumar