A Modern Version Control System Built on Graph Theory and Algorithmic Efficiency
GitForge is a high-performance version control system implemented in Python that leverages advanced algorithms for superior merge operations and conflict detection. Built from first principles, GitForge demonstrates how fundamental data structures and graph algorithms can power a robust, production-ready VCS.
- Key Innovations
- Algorithmic Advantages
- Architecture
- Installation
- Usage Guide
- Command Reference
- Technical Deep Dive
- Testing
- Project Structure
| Feature | GitForge Approach | Technical Advantage |
|---|---|---|
| Merge-Base Detection | Bidirectional BFS | O(b^(d/2)) vs O(b^d) - exponentially faster for deep histories |
| Conflict Detection | Typed conflict taxonomy | Precise conflict classification enables targeted resolution strategies |
| Merge Engine | True 3-way merge with diff3 |
Produces conflict markers with base context for informed resolution |
| Index Format | Rich JSON with state metadata | Supports conflict tracking, type classification, and multi-stage data |
| Object Storage | SHA-1 content-addressing with zlib | Efficient deduplication and compression |
GitForge implements the Lowest Common Ancestor (LCA) algorithm using Bidirectional Breadth-First Search, a technique that provides exponential speedup over traditional single-direction traversal.
In version control, finding the merge-base (the most recent common ancestor of two commits) is fundamental to all merge operations. Given the commit DAG (Directed Acyclic Graph), we need to find where two branches diverged.
Standard BFS from one node toward ancestors has time complexity O(b^d) where:
b= average branching factord= distance to common ancestor
┌─────┐
┌───────▶│ A │◀───────┐
│ └─────┘ │
┌───┴───┐ ┌───┴───┐
│ HEAD │ │ other │
└───┬───┘ └───┬───┘
│ │
Frontier 1 Frontier 2
expands UP expands UP
│ │
└───────▶ LCA ◀─────────┘
(Meeting Point)
Algorithm:
def get_merge_base(oid1, oid2):
# Bidirectional BFS - O(b^(d/2)) instead of O(b^d)
visited1, visited2 = {oid1}, {oid2}
frontier1, frontier2 = deque([oid1]), deque([oid2])
while frontier1 or frontier2:
# Expand from branch 1
if frontier1:
current = frontier1.popleft()
if current in visited2:
return current # Found LCA
for parent in get_commit(current).parents:
if parent not in visited1:
visited1.add(parent)
frontier1.append(parent)
# Expand from branch 2 (symmetric)
if frontier2:
current = frontier2.popleft()
if current in visited1:
return current # Found LCA
for parent in get_commit(current).parents:
if parent not in visited2:
visited2.add(parent)
frontier2.append(parent)
return None # No common ancestorComplexity Analysis:
| Approach | Time Complexity | For d=20, b=2 |
|---|---|---|
| Single-direction BFS | O(b^d) | 1,048,576 nodes |
| Bidirectional BFS | O(b^(d/2)) | 1,024 nodes |
This represents a 1000x improvement for typical repository depths.
GitForge implements true 3-way merge using the diff3 algorithm, which provides superior conflict resolution compared to simple 2-way diff.
BASE (common ancestor)
│
┌───────┴───────┐
│ │
HEAD OTHER
(current) (incoming)
A 3-way merge compares:
- BASE ↔ HEAD: What did the current branch change?
- BASE ↔ OTHER: What did the incoming branch change?
- HEAD ↔ OTHER: Do these changes conflict?
When conflicts occur, GitForge produces clear, contextual markers:
<<<<<<< HEAD
current branch content
||||||| BASE
original content (context for resolution)
=======
incoming branch content
>>>>>>> MERGE_HEAD
The inclusion of BASE content (middle section) provides crucial context that helps developers understand why a conflict occurred, not just what the conflicting content is.
GitForge implements a typed conflict taxonomy that precisely categorizes merge conflicts, enabling more intelligent resolution strategies.
| Type | Scenario | Description |
|---|---|---|
content_conflict |
Both modified same file | Classic merge conflict with differing changes |
add_add |
Both added same file | Same filename, different content |
current_delete_target_modify |
HEAD deleted, OTHER modified | File removed locally but changed remotely |
current_modify_target_delete |
HEAD modified, OTHER deleted | File changed locally but removed remotely |
{
"path/to/file.py": {
"state": "conflict",
"type": "content_conflict",
"oid": "abc123...",
"base": "def456...",
"head": "ghi789...",
"other": "jkl012..."
}
}This rich metadata enables:
- Precise status reporting
- Type-specific resolution workflows
- Full reconstruction of merge state
GitForge uses a shared commit application pattern for cherry-pick and rebase operations, eliminating code duplication and ensuring consistent behavior.
┌─────────────────┐
│ _apply_commit │
│ (shared core) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│cherry-pick│ │ rebase │ │_finish_ │
│ │ │ replay │ │ apply │
└───────────┘ └───────────┘ └───────────┘
Benefits:
- Single implementation for 3-way merge logic
- Consistent conflict handling across operations
- Author preservation for both operations
- Unified empty-commit detection
┌──────────────────────────────────────────────────────────────────┐
│ CLI Layer │
│ (cli.py) │
│ Argument parsing, command dispatch, user-facing output │
└────────────────────────────┬─────────────────────────────────────┘
│
┌────────────────────────────▼─────────────────────────────────────┐
│ Repository Layer │
│ (repository.py) │
│ Core VCS operations: commit, merge, rebase, cherry-pick │
│ Graph algorithms: BFS, LCA, commit iteration │
└───────────┬──────────────────────────────────┬───────────────────┘
│ │
┌───────────▼───────────┐ ┌───────────▼───────────────────┐
│ Diff Engine │ │ Objects Layer │
│ (diff_engine.py) │ │ (objects.py) │
│ │ │ │
│ • Tree comparison │ │ • Content-addressable store │
│ • 3-way merge │ │ • Reference management │
│ • diff3 integration │ │ • Index (staging area) │
│ • Conflict detection │ │ • Configuration │
└───────────────────────┘ └───────────────────────────────┘
│
┌──────────────────────────────────────────────▼───────────────────┐
│ Remote Layer │
│ (remotes.py) │
│ Object transfer, ref synchronization, push safety │
└──────────────────────────────────────────────────────────────────┘
- Python 3.8+
diffutility (standard on Unix/macOS/Linux)diff3utility (for merge operations)graphviz(optional, for commit visualization)
# Clone the repository
git clone https://github.com/gitforge/gitforge.git
cd gitforge
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Verify installation
gitforge --helppip install gitforge# Initialize a new repository
mkdir my-project && cd my-project
gitforge init
# Configure user identity
gitforge config user.name "Your Name"
gitforge config user.email "you@example.com"
# Create and stage files
echo "Hello, GitForge!" > README.md
gitforge add README.md
# Commit changes
gitforge commit -m "Initial commit"# Create a new branch
gitforge branch feature-x
# Switch to branch
gitforge checkout feature-x
# Make changes and commit
echo "New feature" > feature.txt
gitforge add feature.txt
gitforge commit -m "Add feature X"
# Switch back and merge
gitforge checkout master
gitforge merge feature-x# When merge conflicts occur
gitforge merge feature-branch
# Output: CONFLICT in file.txt
# Check status for conflict details
gitforge status
# Shows: both modified: file.txt (content_conflict)
# Edit conflicted files, resolve markers
# Stage resolved files
gitforge add file.txt
# Complete the merge
gitforge commit -m "Merge feature-branch"# Apply a specific commit to current branch
gitforge cherry-pick abc1234
# If conflicts occur:
# 1. Resolve conflicts
# 2. gitforge add <files>
# 3. gitforge cherry-pick --continue
# Or abort:
gitforge cherry-pick --abort# Rebase current branch onto master
gitforge rebase master
# If conflicts during replay:
# 1. Resolve conflicts
# 2. gitforge add <files>
# 3. gitforge rebase --continue
# Or abort entire rebase:
gitforge rebase --abort| Command | Description |
|---|---|
gitforge init |
Initialize new repository in current directory |
gitforge config <key> [value] |
Get/set configuration values |
| Command | Description |
|---|---|
gitforge add <files...> |
Stage files for commit |
gitforge commit -m <message> |
Create new commit with staged changes |
gitforge status |
Show working tree and staging status |
| Command | Description |
|---|---|
gitforge log [commit] |
Show commit history from specified commit |
gitforge show [commit] |
Show commit details and diff |
gitforge diff [--cached] [commit] |
Show changes between trees |
gitforge cat-file <oid> |
Output contents of object |
| Command | Description |
|---|---|
gitforge branch [name] [start] |
List or create branches |
gitforge checkout <ref> |
Switch HEAD to branch or commit |
gitforge tag <name> [oid] |
Create lightweight tag |
| Command | Description |
|---|---|
gitforge merge <commit> |
Merge commit into current branch |
gitforge merge --abort |
Abort in-progress merge |
gitforge merge-base <c1> <c2> |
Find common ancestor (LCA) |
gitforge cherry-pick <commit> |
Apply single commit |
gitforge rebase <upstream> |
Rebase onto upstream |
| Command | Description |
|---|---|
gitforge reset [--soft|--mixed|--hard] <commit> |
Reset HEAD to commit |
| Command | Description |
|---|---|
gitforge fetch <remote> |
Fetch objects and refs from remote |
gitforge push <remote> <branch> |
Push branch to remote |
| Command | Description |
|---|---|
gitforge hash-object <file> |
Compute object hash and store |
gitforge write-tree |
Write index to tree object |
gitforge read-tree <tree> |
Read tree into index |
gitforge k |
Visualize commit graph (requires graphviz) |
GitForge uses content-addressable storage with SHA-1 hashing:
.gitforge/objects/
├── ab/
│ └── cdef1234... (compressed blob/tree/commit)
├── 12/
│ └── 345678... (another object)
Objects are compressed with zlib and prefixed with type information:
<type>\x00<content>
References are stored as files with either:
- Direct reference: Contains SHA-1 of commit
- Symbolic reference: Contains
ref: <other-ref>
.gitforge/
├── HEAD # ref: refs/heads/master
├── MERGE_HEAD # SHA1 (during merge)
├── ORIG_HEAD # SHA1 (backup for abort)
├── CHERRY_PICK_HEAD # SHA1 (during cherry-pick)
└── refs/
├── heads/
│ ├── master # SHA1
│ └── feature # SHA1
└── tags/
└── v1.0 # SHA1
The index is stored as JSON with rich metadata:
{
"src/main.py": {
"state": "clear",
"oid": "abc123def456..."
},
"src/utils.py": {
"state": "conflict",
"type": "content_conflict",
"oid": "merged-with-markers...",
"base": "original-oid",
"head": "current-oid",
"other": "incoming-oid"
}
}tree <tree-sha1>
parent <parent-sha1>
parent <parent2-sha1> # For merge commits
author <name> <email> <timestamp> <tz>
committer <name> <email> <timestamp> <tz>
<commit message>
GitForge includes a comprehensive functional test suite covering all operations:
# Run all tests
python -m pytest tests/ -v
# Run specific test class
python -m pytest tests/functional/test_gitforge.py::TestMerge -v
# Run with coverage
python -m pytest tests/ --cov=gitforge --cov-report=html- Basic Operations: init, hash-object, cat-file, add, commit
- Tree Operations: write-tree, read-tree
- Branching: branch, checkout, tag
- History: log, show, diff
- Integration: merge, cherry-pick, rebase
- Remotes: fetch, push
- Conflict Handling: All conflict types with resolution
gitforge/
├── gitforge/
│ ├── __init__.py # Package marker
│ ├── cli.py # Command-line interface and argument parsing
│ ├── repository.py # Core VCS logic and graph algorithms
│ ├── objects.py # Object storage, refs, index management
│ ├── diff_engine.py # Tree comparison and 3-way merge
│ └── remotes.py # Remote operations (fetch/push)
├── tests/
│ └── functional/
│ └── test_gitforge.py # Comprehensive functional tests
├── pyproject.toml # Project configuration and dependencies
└── README.md # This file
| Module | Responsibility |
|---|---|
cli.py |
Argument parsing, command dispatch, output formatting |
repository.py |
Commit graph operations, merge logic, rebase/cherry-pick |
objects.py |
Object storage, reference management, index, config |
diff_engine.py |
Tree comparison, diff generation, 3-way merge |
remotes.py |
Object transfer, ref synchronization |
- Fork the repository
- Create a feature branch (
gitforge checkout -b feature/amazing-feature) - Commit your changes (
gitforge commit -m 'Add amazing feature') - Push to the branch (
gitforge push origin feature/amazing-feature) - Open a Pull Request
GitForge demonstrates that version control systems can be built on solid algorithmic foundations. The bidirectional BFS for LCA detection, true 3-way merge with diff3, and typed conflict classification represent deliberate design choices that prioritize both correctness and performance.
GitForge — Version Control, Reforged