Distributed Raft Cache

A production-ready distributed cache implementation using the Raft consensus algorithm with ML-based eviction, read lease optimization, and comprehensive testing.

Project Overview

This project implements a distributed key-value cache with:

Raft Consensus: Ensures strong consistency across multiple nodes
Read Lease Optimization: 105x speedup for read operations
ML-Based Eviction: Intelligent cache eviction using access pattern prediction
Snapshot Support: Efficient state persistence and recovery
Multiple Read Consistency Levels: STRONG, LEASE, and EVENTUAL
High Availability: Automatic leader election and follower recovery

Features

Core Capabilities

✅ Raft Consensus Protocol

Leader election with randomized timeouts
Log replication with majority quorum
Committed entry durability
NO-OP entries for immediate leadership confirmation

✅ Read Optimizations

ReadIndex Protocol: Linearizable reads without log appends
Read Lease: Time-based leases for 105x faster reads
Eventual Consistency: Fastest reads for non-critical data

✅ Cache Eviction

ML-Based: RandomForest predictions on access patterns
LRU Fallback: Graceful degradation when ML unavailable
Access Tracking: Detailed statistics for prediction

✅ Persistence & Recovery

Snapshots: Automatic snapshot creation every 1000 entries
Log Persistence: Durable storage of Raft logs
Crash Recovery: Followers catch up after downtime

✅ Testing

Follower recovery test (100% pass rate)
Concurrent writes test (100% pass rate)
Comprehensive test suite with proper timing

Architecture

┌─────────────────────────────────────────────────────────┐
│              Java Raft Cache Cluster                    │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │   Node 1     │  │   Node 2     │  │   Node 3     │  │
│  │  (Leader)    │  │  (Follower)  │  │  (Follower)  │  │
│  │              │  │              │  │              │  │
│  │ HTTP: 8081   │  │ HTTP: 8082   │  │ HTTP: 8083   │  │
│  │ Raft: 9001   │  │ Raft: 9002   │  │ Raft: 9003   │  │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  │
│         │                 │                 │            │
│         └─────────────────┴─────────────────┘            │
│              Raft Protocol (Netty)                       │
└────────────────────────┬────────────────────────────────┘
                         │ HTTP REST API
                         ↓
                  ┌──────────────┐
                  │  Python ML   │
                  │  Service     │
                  │  Port: 5001  │
                  └──────────────┘

Quick Start

Prerequisites

Java 17 or later
Maven 3.6+
Python 3.8+ (for ML service and node manager)
Node.js 18+ (for demo UI)
jq (for test scripts)

Build the Project

# Clone the repository
git clone https://github.com/jahnavikedia/raft-cache.git
cd raft-cache

# Build with Maven
mvn clean package

# Start the ML service (optional - cache works with LRU if ML unavailable)
cd ml-service
pip install -r requirements.txt
python app.py &
cd ..

Start the Cluster with Demo UI

# Start the node manager (manages cluster lifecycle)
python node-manager.py &

# Start the frontend demo UI
cd frontend
npm install
npm run dev &
cd ..

# Open http://localhost:5173 in your browser

Alternative: Command Line Only

# Start 3-node cluster (cleans data directory for fresh start)
./scripts/start-cluster.sh

# Stop the cluster
./scripts/stop-cluster.sh

Run Tests

# Test follower recovery
./test_follower_recovery.sh

# Test concurrent writes
./test_concurrent_writes.sh

# Test read lease optimization
./scripts/test_read_lease.sh

API Usage

Write Data (PUT)

curl -X POST http://localhost:8081/cache/mykey \
  -H 'Content-Type: application/json' \
  -d '{"value":"myvalue","clientId":"client1","sequenceNumber":1}'

Read Data (GET)

# Strong consistency (linearizable via ReadIndex)
curl "http://localhost:8081/cache/mykey?consistency=strong"

# Lease-based read (105x faster when lease valid)
curl "http://localhost:8081/cache/mykey?consistency=lease"

# Eventual consistency (fastest, may be stale)
curl "http://localhost:8081/cache/mykey?consistency=eventual"

Delete Data

curl -X DELETE http://localhost:8081/cache/mykey \
  -H 'Content-Type: application/json' \
  -d '{"clientId":"client1","sequenceNumber":2}'

Check Node Status

curl http://localhost:8081/status
# Returns: {"nodeId":"node1","role":"LEADER","currentTerm":5,...}

Get Cache Statistics

curl http://localhost:8081/stats
# Returns cache size, eviction count, ML availability, etc.

Demo UI

The project includes a React-based demo UI for visualizing the Raft cluster:

Cluster Status: Real-time view of all nodes (Leader/Follower status, Term, Log size)
Raft Election: Visualize leader election by killing/starting nodes
Log Replication: Watch log entries replicate across the cluster
Cache Operations: Write, read, and delete cache entries
Performance: Compare read latency across consistency levels (Strong vs Lease vs Eventual)
ML Eviction: Demo ML-based cache eviction vs traditional LRU

Project Structure

raft-cache/
├── frontend/                              # React demo UI
│   ├── src/
│   │   ├── App.jsx                        # Main application
│   │   └── components/
│   │       ├── ClusterStatus.jsx          # Node status cards
│   │       ├── ElectionDemo.jsx           # Election visualization
│   │       ├── ReplicationDemo.jsx        # Log replication demo
│   │       ├── CacheView.jsx              # Cache operations
│   │       ├── LatencyComparison.jsx      # Performance comparison
│   │       └── EvictionDemo.jsx           # ML eviction demo
│   └── package.json
├── node-manager.py                        # Python cluster manager (Flask API)
├── src/main/java/com/distributed/cache/
│   ├── Main.java                          # Entry point
│   ├── raft/
│   │   ├── RaftNode.java                  # Core Raft logic
│   │   ├── RaftState.java                 # Leader/Follower/Candidate states
│   │   ├── LogEntry.java                  # Log entry structure
│   │   ├── ReadIndexManager.java          # ReadIndex protocol
│   │   ├── ReadLease.java                 # Lease-based reads
│   │   ├── Snapshot.java                  # Snapshot data structure
│   │   ├── SnapshotManager.java           # Snapshot creation/restoration
│   │   └── api/
│   │       ├── CacheRESTServer.java       # HTTP API
│   │       └── LeaderProxy.java           # Request forwarding
│   ├── replication/
│   │   ├── LeaderReplicator.java          # Leader-side replication
│   │   ├── FollowerReplicator.java        # Follower-side log handling
│   │   └── RaftLog.java                   # Log management
│   ├── store/
│   │   ├── KeyValueStore.java             # State machine
│   │   ├── AccessTracker.java             # Access pattern tracking
│   │   ├── AccessStats.java               # Per-key statistics
│   │   └── ReadConsistency.java           # Consistency level enum
│   ├── eviction/
│   │   ├── EvictionPolicy.java            # Eviction interface
│   │   ├── MLEvictionPolicy.java          # ML-based eviction
│   │   ├── LRUEvictionPolicy.java         # LRU fallback
│   │   ├── MLClient.java                  # ML service client
│   │   └── MLPrediction.java              # Prediction result
│   ├── network/
│   │   └── NetworkBase.java               # Netty-based communication
│   └── storage/
│       └── LogPersistence.java            # Disk persistence
├── ml-service/
│   ├── app.py                             # Flask ML service
│   ├── requirements.txt                   # Python dependencies
│   └── train_model.py                     # Model training (optional)
├── config/
│   ├── node1.yaml                         # Node 1 configuration
│   ├── node2.yaml                         # Node 2 configuration
│   └── node3.yaml                         # Node 3 configuration
├── scripts/
│   ├── start-cluster.sh                   # Start cluster helper
│   ├── stop-cluster.sh                    # Stop cluster helper
│   └── test_read_lease.sh                 # Read lease benchmark
├── test_follower_recovery.sh              # Follower recovery test
├── test_concurrent_writes.sh              # Concurrent write test
└── pom.xml                                # Maven configuration

Implementation Progress

✅ Completed Features

Core Raft (Days 1-6)

Leader election with randomized timeouts
Log replication with majority quorum
State machine (key-value store)
Log persistence and recovery
Network communication (Netty)
REST API for client operations

Advanced Features (Day 1-2)

ReadIndex protocol for linearizable reads
Read Lease optimization (105x speedup)
Snapshot creation and restoration
Access pattern tracking
ML-based cache eviction
Multiple consistency levels (STRONG/LEASE/EVENTUAL)

Testing & Quality (Day 3)

Follower recovery test
Concurrent writes test
Bug fix: Snapshot/log index mismatch
Comprehensive documentation

Performance Benchmarks

Read Latency (from test_read_lease.sh):

STRONG: ~50-100ms (ReadIndex protocol with heartbeat)
LEASE: ~0.5-1ms (lease-based, 105x faster)
EVENTUAL: ~0.3-0.5ms (local read, fastest)

Write Throughput:

120 concurrent writes in ~8 seconds
15 writes/second with 4 concurrent clients
100% success rate, 0% data loss

Recovery Time:

Leader election: ~5 seconds
Follower catch-up: ~10 seconds for 50 entries
Snapshot restoration: <1 second

Configuration

Each node has a YAML configuration file:

# config/node1.yaml
nodeId: node1
raftPort: 9001
httpPort: 8081
dataDir: data/node1
peers:
  - nodeId: node2
    host: localhost
    port: 9002
  - nodeId: node3
    host: localhost
    port: 9003

Testing

Test Suite

Follower Recovery Test (test_follower_recovery.sh)
- Writes 30 keys, waits for replication
- Kills follower, writes 20 more keys
- Restarts follower and verifies catch-up
- Pass Criteria: ≥8/10 old keys, ≥7/10 new keys
Concurrent Writes Test (test_concurrent_writes.sh)
- Launches 4 concurrent clients
- Each writes 30 unique keys (120 total)
- Verifies data consistency and deduplication
- Pass Criteria: ≥90% success, ≥90% consistency
Read Lease Benchmark (scripts/test_read_lease.sh)
- Measures read latency across consistency levels
- Validates 105x speedup with lease-based reads

Running Tests

# Run all tests
./test_follower_recovery.sh
./test_concurrent_writes.sh
./scripts/test_read_lease.sh

# Expected output: All tests PASSED

Known Issues & Limitations

Current Limitations

Snapshot/Log Reconciliation: Test script cleans data directory on startup (not production-safe)
No Log Rotation: Logs grow unbounded
No Graceful Shutdown: Uses kill -9 in tests
Hard-coded Timeouts: Should be configurable
No Dynamic Membership: Cluster topology is static

Future Improvements

Proper snapshot/log reconciliation for restarts
Log rotation and compaction
Graceful shutdown mechanism
Configuration-based timeouts
Dynamic cluster membership (add/remove nodes)
Monitoring and metrics export (Prometheus)
Leader transfer for rolling upgrades

Documentation

DAY2_COMPLETION_SUMMARY.md: ReadIndex, Read Lease, ML Eviction, Snapshots
DAY3_TEST_FINDINGS.md: Detailed test analysis and bug findings
DAY3_SUMMARY.md: Executive summary of Day 3 work

Technology Stack

Java Backend

Java 17: Core language
Maven: Build and dependency management
Netty: Asynchronous network communication
Javalin: Lightweight HTTP framework
Gson: JSON serialization
SLF4J + Logback: Logging

React Frontend

React 18: UI framework
Vite: Build tool
Axios: HTTP client
Lucide React: Icons
Tailwind CSS: Styling

Python Services

Flask: REST API framework (ML service + node manager)
scikit-learn: RandomForest for eviction prediction
NumPy: Numerical computations

Testing

Bash: Test scripts
jq: JSON parsing in tests
curl: HTTP client

Contributing

This is an academic project demonstrating distributed systems concepts. For questions or suggestions, please open an issue.

References

"In Search of an Understandable Consensus Algorithm" (Raft Paper) - Diego Ongaro & John Ousterhout
"Rethink the Linearizability Constraints of Raft" - Performance optimizations for reads
Raft Visualization: thesecretlivesofdata.com/raft

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
config		config
frontend		frontend
ml-service		ml-service
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Raft Cache consensus.pdf		Raft Cache consensus.pdf
node-manager.py		node-manager.py
package-lock.json		package-lock.json
package.json		package.json
pom.xml		pom.xml
start_demo.sh		start_demo.sh

jahnavikedia/raft-cache

Folders and files

Latest commit

History

Repository files navigation

Distributed Raft Cache

Project Overview

Features

Core Capabilities

Architecture

Quick Start

Prerequisites

Build the Project

Start the Cluster with Demo UI

Alternative: Command Line Only

Run Tests

API Usage

Write Data (PUT)

Read Data (GET)

Delete Data

Check Node Status

Get Cache Statistics

Demo UI

Project Structure

Implementation Progress

✅ Completed Features

Performance Benchmarks

Configuration

Testing

Test Suite

Running Tests

Known Issues & Limitations

Current Limitations

Future Improvements

Documentation

Technology Stack

Java Backend

React Frontend

Python Services

Testing

Contributing

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages