RaftKV: Distributed Consistent Key-Value Store

A distributed, strongly consistent, and crash-resilient Key-Value store based on the Raft Consensus Algorithm (Ongaro & Ousterhout). Designed for high availability and linearizable semantics in distributed environments.

Context & Motivation

Distributed systems require a "source of truth" that remains consistent even under partial network failures or node crashes. RaftKV exists to solve the coordination problem by providing a replicated state machine. It is intended for metadata management, configuration storage, or as a foundational component for more complex distributed databases.

Architecture

The system follows a modular, layered architecture to decouple consensus logic from storage and networking.

Raft Core (src/raft/): Implements the State Machine Replication (SMR). Handles leader election, log replication, and safety invariants.
Networking (src/network/): High-performance gRPC layer using tonic. Handles inter-node RPCs (RequestVote, AppendEntries) and Client APIs (Execute).
Storage Layer (src/storage/):
- Log Store: Persistent Write-Ahead Log (WAL) backed by RocksDB for efficient sequential writes and crash recovery.
- Snapshot Store: File-based state checkpointing to prevent unbounded log growth.
State Machine (src/statemachine/): In-memory KV map (HashMap with RwLock) where committed commands are applied.

Data Flow

Mutation: Client sends PUT/DELETE to KVService.
Replication: Leader appends command to local log and broadcasts AppendEntries to followers.
Consensus: Once a majority (quorum) acknowledges, the leader advances its commit_index.
Execution: Committed entries are applied to the KVStore state machine.
Response: Result is returned to the client.

Tech Stack

Rust: Memory safety without GC, critical for predictable latency in consensus.
Tokio: Asynchronous runtime for high-concurrency I/O.
Tonic/Prost: gRPC implementation for efficient, type-safe inter-node communication.
RocksDB: Industrial-grade persistent key-value store used for the Raft WAL.
Bincode/Serde: Zero-overhead serialization for log entries and snapshots.

Local Setup

Prerequisites

Rust 1.70+
Protobuf Compiler (protoc)
LLVM/Clang (required by rust-rocksdb)

Execution

To simulate a 3-node cluster locally:

chmod +x scripts/simulate_cluster.sh
./scripts/simulate_cluster.sh

The script initializes 3 nodes, triggers an election, simulates a leader crash, and verifies recovery.

Real-World Use Cases

Distributed Configuration: Consistent settings across a fleet of microservices.
Service Discovery: Reliable registry for dynamic service endpoints.
Distributed Locking: Foundation for a distributed lock manager (DLM).

Technical Evaluation (Senior Engineering Perspective)

Strengths

Strong Persistence Foundation: Direct use of RocksDB for WAL ensures that currentTerm and votedFor are flushed before RPC responses, a common pitfall in naive implementations.
Async-First Design: Leveraging tokio::select! for the main event loop cleanly handles heartbeats, election timeouts, and RPC messages without complex threading.
Clean Abstractions: The LogStore trait allows swapping storage engines (e.g., in-memory for testing, RocksDB for prod).

Limitations & Trade-offs

Read Linearity: Current GET implementation is a placeholder. Production requires ReadIndex or LeaseRead to avoid returning stale data from followers.
Membership Changes: The cluster configuration is static. Adding/removing nodes requires a full restart (Single-server member change implementation is missing).
Message Processing: The network layer is currently a skeleton; actual gRPC client calls to peers within the Raft loop need completion to move beyond simulation.

Future Engineering Roadmap

Linearizable Reads: Implement ReadIndex (Section 6.4 of Raft paper) to ensure clients always see the latest committed state.
Dynamic Membership: Implement AddNode/RemoveNode via joint consensus.
Pipelining & Batching: Batch log entries in AppendEntries and pipeline RPCs to saturate network bandwidth.
gRPC Health Probing: Integrated health checks for faster failure detection.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
proto		proto
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
node1.log		node1.log
node1_restart.log		node1_restart.log
node2.log		node2.log
node3.log		node3.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RaftKV: Distributed Consistent Key-Value Store

Context & Motivation

Architecture

Data Flow

Tech Stack

Local Setup

Prerequisites

Execution

Real-World Use Cases

Technical Evaluation (Senior Engineering Perspective)

Strengths

Limitations & Trade-offs

Future Engineering Roadmap

About

Uh oh!

Releases

Packages

Languages

tensoriz/RaftKV

Folders and files

Latest commit

History

Repository files navigation

RaftKV: Distributed Consistent Key-Value Store

Context & Motivation

Architecture

Data Flow

Tech Stack

Local Setup

Prerequisites

Execution

Real-World Use Cases

Technical Evaluation (Senior Engineering Perspective)

Strengths

Limitations & Trade-offs

Future Engineering Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages