A production-ready distributed cache implementation using the Raft consensus algorithm with ML-based eviction, read lease optimization, and comprehensive testing.
This project implements a distributed key-value cache with:
- Raft Consensus: Ensures strong consistency across multiple nodes
- Read Lease Optimization: 105x speedup for read operations
- ML-Based Eviction: Intelligent cache eviction using access pattern prediction
- Snapshot Support: Efficient state persistence and recovery
- Multiple Read Consistency Levels: STRONG, LEASE, and EVENTUAL
- High Availability: Automatic leader election and follower recovery
✅ Raft Consensus Protocol
- Leader election with randomized timeouts
- Log replication with majority quorum
- Committed entry durability
- NO-OP entries for immediate leadership confirmation
✅ Read Optimizations
- ReadIndex Protocol: Linearizable reads without log appends
- Read Lease: Time-based leases for 105x faster reads
- Eventual Consistency: Fastest reads for non-critical data
✅ Cache Eviction
- ML-Based: RandomForest predictions on access patterns
- LRU Fallback: Graceful degradation when ML unavailable
- Access Tracking: Detailed statistics for prediction
✅ Persistence & Recovery
- Snapshots: Automatic snapshot creation every 1000 entries
- Log Persistence: Durable storage of Raft logs
- Crash Recovery: Followers catch up after downtime
✅ Testing
- Follower recovery test (100% pass rate)
- Concurrent writes test (100% pass rate)
- Comprehensive test suite with proper timing
┌─────────────────────────────────────────────────────────┐
│ Java Raft Cache Cluster │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │
│ │ (Leader) │ │ (Follower) │ │ (Follower) │ │
│ │ │ │ │ │ │ │
│ │ HTTP: 8081 │ │ HTTP: 8082 │ │ HTTP: 8083 │ │
│ │ Raft: 9001 │ │ Raft: 9002 │ │ Raft: 9003 │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └─────────────────┴─────────────────┘ │
│ Raft Protocol (Netty) │
└────────────────────────┬────────────────────────────────┘
│ HTTP REST API
↓
┌──────────────┐
│ Python ML │
│ Service │
│ Port: 5001 │
└──────────────┘
- Java 17 or later
- Maven 3.6+
- Python 3.8+ (for ML service and node manager)
- Node.js 18+ (for demo UI)
- jq (for test scripts)
# Clone the repository
git clone https://github.com/jahnavikedia/raft-cache.git
cd raft-cache
# Build with Maven
mvn clean package
# Start the ML service (optional - cache works with LRU if ML unavailable)
cd ml-service
pip install -r requirements.txt
python app.py &
cd ..# Start the node manager (manages cluster lifecycle)
python node-manager.py &
# Start the frontend demo UI
cd frontend
npm install
npm run dev &
cd ..
# Open http://localhost:5173 in your browser# Start 3-node cluster (cleans data directory for fresh start)
./scripts/start-cluster.sh
# Stop the cluster
./scripts/stop-cluster.sh# Test follower recovery
./test_follower_recovery.sh
# Test concurrent writes
./test_concurrent_writes.sh
# Test read lease optimization
./scripts/test_read_lease.shcurl -X POST http://localhost:8081/cache/mykey \
-H 'Content-Type: application/json' \
-d '{"value":"myvalue","clientId":"client1","sequenceNumber":1}'# Strong consistency (linearizable via ReadIndex)
curl "http://localhost:8081/cache/mykey?consistency=strong"
# Lease-based read (105x faster when lease valid)
curl "http://localhost:8081/cache/mykey?consistency=lease"
# Eventual consistency (fastest, may be stale)
curl "http://localhost:8081/cache/mykey?consistency=eventual"curl -X DELETE http://localhost:8081/cache/mykey \
-H 'Content-Type: application/json' \
-d '{"clientId":"client1","sequenceNumber":2}'curl http://localhost:8081/status
# Returns: {"nodeId":"node1","role":"LEADER","currentTerm":5,...}curl http://localhost:8081/stats
# Returns cache size, eviction count, ML availability, etc.The project includes a React-based demo UI for visualizing the Raft cluster:
- Cluster Status: Real-time view of all nodes (Leader/Follower status, Term, Log size)
- Raft Election: Visualize leader election by killing/starting nodes
- Log Replication: Watch log entries replicate across the cluster
- Cache Operations: Write, read, and delete cache entries
- Performance: Compare read latency across consistency levels (Strong vs Lease vs Eventual)
- ML Eviction: Demo ML-based cache eviction vs traditional LRU
raft-cache/
├── frontend/ # React demo UI
│ ├── src/
│ │ ├── App.jsx # Main application
│ │ └── components/
│ │ ├── ClusterStatus.jsx # Node status cards
│ │ ├── ElectionDemo.jsx # Election visualization
│ │ ├── ReplicationDemo.jsx # Log replication demo
│ │ ├── CacheView.jsx # Cache operations
│ │ ├── LatencyComparison.jsx # Performance comparison
│ │ └── EvictionDemo.jsx # ML eviction demo
│ └── package.json
├── node-manager.py # Python cluster manager (Flask API)
├── src/main/java/com/distributed/cache/
│ ├── Main.java # Entry point
│ ├── raft/
│ │ ├── RaftNode.java # Core Raft logic
│ │ ├── RaftState.java # Leader/Follower/Candidate states
│ │ ├── LogEntry.java # Log entry structure
│ │ ├── ReadIndexManager.java # ReadIndex protocol
│ │ ├── ReadLease.java # Lease-based reads
│ │ ├── Snapshot.java # Snapshot data structure
│ │ ├── SnapshotManager.java # Snapshot creation/restoration
│ │ └── api/
│ │ ├── CacheRESTServer.java # HTTP API
│ │ └── LeaderProxy.java # Request forwarding
│ ├── replication/
│ │ ├── LeaderReplicator.java # Leader-side replication
│ │ ├── FollowerReplicator.java # Follower-side log handling
│ │ └── RaftLog.java # Log management
│ ├── store/
│ │ ├── KeyValueStore.java # State machine
│ │ ├── AccessTracker.java # Access pattern tracking
│ │ ├── AccessStats.java # Per-key statistics
│ │ └── ReadConsistency.java # Consistency level enum
│ ├── eviction/
│ │ ├── EvictionPolicy.java # Eviction interface
│ │ ├── MLEvictionPolicy.java # ML-based eviction
│ │ ├── LRUEvictionPolicy.java # LRU fallback
│ │ ├── MLClient.java # ML service client
│ │ └── MLPrediction.java # Prediction result
│ ├── network/
│ │ └── NetworkBase.java # Netty-based communication
│ └── storage/
│ └── LogPersistence.java # Disk persistence
├── ml-service/
│ ├── app.py # Flask ML service
│ ├── requirements.txt # Python dependencies
│ └── train_model.py # Model training (optional)
├── config/
│ ├── node1.yaml # Node 1 configuration
│ ├── node2.yaml # Node 2 configuration
│ └── node3.yaml # Node 3 configuration
├── scripts/
│ ├── start-cluster.sh # Start cluster helper
│ ├── stop-cluster.sh # Stop cluster helper
│ └── test_read_lease.sh # Read lease benchmark
├── test_follower_recovery.sh # Follower recovery test
├── test_concurrent_writes.sh # Concurrent write test
└── pom.xml # Maven configuration
Core Raft (Days 1-6)
- Leader election with randomized timeouts
- Log replication with majority quorum
- State machine (key-value store)
- Log persistence and recovery
- Network communication (Netty)
- REST API for client operations
Advanced Features (Day 1-2)
- ReadIndex protocol for linearizable reads
- Read Lease optimization (105x speedup)
- Snapshot creation and restoration
- Access pattern tracking
- ML-based cache eviction
- Multiple consistency levels (STRONG/LEASE/EVENTUAL)
Testing & Quality (Day 3)
- Follower recovery test
- Concurrent writes test
- Bug fix: Snapshot/log index mismatch
- Comprehensive documentation
Read Latency (from test_read_lease.sh):
- STRONG: ~50-100ms (ReadIndex protocol with heartbeat)
- LEASE: ~0.5-1ms (lease-based, 105x faster)
- EVENTUAL: ~0.3-0.5ms (local read, fastest)
Write Throughput:
- 120 concurrent writes in ~8 seconds
- 15 writes/second with 4 concurrent clients
- 100% success rate, 0% data loss
Recovery Time:
- Leader election: ~5 seconds
- Follower catch-up: ~10 seconds for 50 entries
- Snapshot restoration: <1 second
Each node has a YAML configuration file:
# config/node1.yaml
nodeId: node1
raftPort: 9001
httpPort: 8081
dataDir: data/node1
peers:
- nodeId: node2
host: localhost
port: 9002
- nodeId: node3
host: localhost
port: 9003-
Follower Recovery Test (
test_follower_recovery.sh)- Writes 30 keys, waits for replication
- Kills follower, writes 20 more keys
- Restarts follower and verifies catch-up
- Pass Criteria: ≥8/10 old keys, ≥7/10 new keys
-
Concurrent Writes Test (
test_concurrent_writes.sh)- Launches 4 concurrent clients
- Each writes 30 unique keys (120 total)
- Verifies data consistency and deduplication
- Pass Criteria: ≥90% success, ≥90% consistency
-
Read Lease Benchmark (
scripts/test_read_lease.sh)- Measures read latency across consistency levels
- Validates 105x speedup with lease-based reads
# Run all tests
./test_follower_recovery.sh
./test_concurrent_writes.sh
./scripts/test_read_lease.sh
# Expected output: All tests PASSED- Snapshot/Log Reconciliation: Test script cleans data directory on startup (not production-safe)
- No Log Rotation: Logs grow unbounded
- No Graceful Shutdown: Uses
kill -9in tests - Hard-coded Timeouts: Should be configurable
- No Dynamic Membership: Cluster topology is static
- Proper snapshot/log reconciliation for restarts
- Log rotation and compaction
- Graceful shutdown mechanism
- Configuration-based timeouts
- Dynamic cluster membership (add/remove nodes)
- Monitoring and metrics export (Prometheus)
- Leader transfer for rolling upgrades
- DAY2_COMPLETION_SUMMARY.md: ReadIndex, Read Lease, ML Eviction, Snapshots
- DAY3_TEST_FINDINGS.md: Detailed test analysis and bug findings
- DAY3_SUMMARY.md: Executive summary of Day 3 work
- Java 17: Core language
- Maven: Build and dependency management
- Netty: Asynchronous network communication
- Javalin: Lightweight HTTP framework
- Gson: JSON serialization
- SLF4J + Logback: Logging
- React 18: UI framework
- Vite: Build tool
- Axios: HTTP client
- Lucide React: Icons
- Tailwind CSS: Styling
- Flask: REST API framework (ML service + node manager)
- scikit-learn: RandomForest for eviction prediction
- NumPy: Numerical computations
- Bash: Test scripts
- jq: JSON parsing in tests
- curl: HTTP client
This is an academic project demonstrating distributed systems concepts. For questions or suggestions, please open an issue.
- "In Search of an Understandable Consensus Algorithm" (Raft Paper) - Diego Ongaro & John Ousterhout
- "Rethink the Linearizability Constraints of Raft" - Performance optimizations for reads
- Raft Visualization: thesecretlivesofdata.com/raft