Skip to content

A fault-tolerant, consistent distributed key-value store based on Raft, ensuring high availability and durability for distributed systems.

Notifications You must be signed in to change notification settings

tensoriz/RaftKV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RaftKV: Distributed Consistent Key-Value Store

A distributed, strongly consistent, and crash-resilient Key-Value store based on the Raft Consensus Algorithm (Ongaro & Ousterhout). Designed for high availability and linearizable semantics in distributed environments.

Context & Motivation

Distributed systems require a "source of truth" that remains consistent even under partial network failures or node crashes. RaftKV exists to solve the coordination problem by providing a replicated state machine. It is intended for metadata management, configuration storage, or as a foundational component for more complex distributed databases.

Architecture

The system follows a modular, layered architecture to decouple consensus logic from storage and networking.

  • Raft Core (src/raft/): Implements the State Machine Replication (SMR). Handles leader election, log replication, and safety invariants.
  • Networking (src/network/): High-performance gRPC layer using tonic. Handles inter-node RPCs (RequestVote, AppendEntries) and Client APIs (Execute).
  • Storage Layer (src/storage/):
    • Log Store: Persistent Write-Ahead Log (WAL) backed by RocksDB for efficient sequential writes and crash recovery.
    • Snapshot Store: File-based state checkpointing to prevent unbounded log growth.
  • State Machine (src/statemachine/): In-memory KV map (HashMap with RwLock) where committed commands are applied.

Data Flow

  1. Mutation: Client sends PUT/DELETE to KVService.
  2. Replication: Leader appends command to local log and broadcasts AppendEntries to followers.
  3. Consensus: Once a majority (quorum) acknowledges, the leader advances its commit_index.
  4. Execution: Committed entries are applied to the KVStore state machine.
  5. Response: Result is returned to the client.

Tech Stack

  • Rust: Memory safety without GC, critical for predictable latency in consensus.
  • Tokio: Asynchronous runtime for high-concurrency I/O.
  • Tonic/Prost: gRPC implementation for efficient, type-safe inter-node communication.
  • RocksDB: Industrial-grade persistent key-value store used for the Raft WAL.
  • Bincode/Serde: Zero-overhead serialization for log entries and snapshots.

Local Setup

Prerequisites

  • Rust 1.70+
  • Protobuf Compiler (protoc)
  • LLVM/Clang (required by rust-rocksdb)

Execution

To simulate a 3-node cluster locally:

chmod +x scripts/simulate_cluster.sh
./scripts/simulate_cluster.sh

The script initializes 3 nodes, triggers an election, simulates a leader crash, and verifies recovery.

Real-World Use Cases

  • Distributed Configuration: Consistent settings across a fleet of microservices.
  • Service Discovery: Reliable registry for dynamic service endpoints.
  • Distributed Locking: Foundation for a distributed lock manager (DLM).

Technical Evaluation (Senior Engineering Perspective)

Strengths

  • Strong Persistence Foundation: Direct use of RocksDB for WAL ensures that currentTerm and votedFor are flushed before RPC responses, a common pitfall in naive implementations.
  • Async-First Design: Leveraging tokio::select! for the main event loop cleanly handles heartbeats, election timeouts, and RPC messages without complex threading.
  • Clean Abstractions: The LogStore trait allows swapping storage engines (e.g., in-memory for testing, RocksDB for prod).

Limitations & Trade-offs

  • Read Linearity: Current GET implementation is a placeholder. Production requires ReadIndex or LeaseRead to avoid returning stale data from followers.
  • Membership Changes: The cluster configuration is static. Adding/removing nodes requires a full restart (Single-server member change implementation is missing).
  • Message Processing: The network layer is currently a skeleton; actual gRPC client calls to peers within the Raft loop need completion to move beyond simulation.

Future Engineering Roadmap

  • Linearizable Reads: Implement ReadIndex (Section 6.4 of Raft paper) to ensure clients always see the latest committed state.
  • Dynamic Membership: Implement AddNode/RemoveNode via joint consensus.
  • Pipelining & Batching: Batch log entries in AppendEntries and pipeline RPCs to saturate network bandwidth.
  • gRPC Health Probing: Integrated health checks for faster failure detection.

About

A fault-tolerant, consistent distributed key-value store based on Raft, ensuring high availability and durability for distributed systems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published