Skip to content

Pradheep-Sundaresan/SentryKV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SentryKV

A crash-safe, concurrent, persistent key-value store

SentryKV is a crash-safe, concurrent, persistent key-value store built in modern C++.

I built this as a systems exercise with a production mindset: correctness first, explicit durability semantics, observable behavior, and verification using tests + sanitizers. The result is not “just a map wrapper” — it’s a small storage engine that makes its guarantees precise.

Why SentryKV?

Most "KV store exercises" stop at a map with persistence. SentryKV goes further:

  • Explicit crash-consistency model
  • Background checkpointing with ordering guarantees
  • Corruption detection and fail-fast recovery
  • Concurrency validated under sanitizers
  • Observable durability state (stats + status)

The goal was not features — it was correctness under failure.

Guarantees

  • No partially applied snapshot state is ever observable.
  • WAL replay is idempotent and deterministic.
  • Readers never observe partially written in-memory state.
  • Corrupted persistence files fail fast.

Non-Goals (for now)

  • Distributed replication
  • Sharding
  • Lock-free data structures
  • fsync on every write by default

Performance Model

  • Reads are in-memory and scale with core count using shared locks.
  • Writes enqueue WAL records asynchronously.
  • Durability barriers (flush_wal / checkpoint) are explicit.
  • Throughput is bounded by disk latency only when durability barriers are required.

What SentryKV can do

Core KV API

  • set <key> <value> — insert/overwrite
  • get <key> — fetch (prints value or “not found”)
  • del <key> — delete
  • exists <key> — membership test
  • size / clear
  • keys — list keys (deterministic order)
  • scan <prefix> — prefix scan (deterministic order)

Durability

  • Snapshots: persistent point-in-time copy of the entire KV state
  • WAL (Write-Ahead Log): append-only log of write operations for recovery between snapshots
  • Checkpointing: flush WAL → write snapshot → safely truncate WAL
  • Background checkpointing: periodic checkpoints via a helper thread (autockpt <ms>)

Concurrency

  • Thread-safe KVStore using a reader/writer lock (std::shared_mutex)
  • Asynchronous WAL writer using a producer–consumer queue (mutex + condition_variable)
  • Concurrency and races verified with ThreadSanitizer (TSan)

Observability

  • stats — operation counters + WAL enqueue/flush progress
  • status — WAL enabled? background checkpoint running? checkpoint period?

Architecture (at a glance)

Components + threads

                           ┌───────────────────────────────┐
                           │           CLI / REPL           │
                           │   parse → validate → dispatch  │
                           │        (src/main.cpp)          │
                           └───────────────┬───────────────┘
                                           │ calls
                                           ▼
        ┌─────────────────────────────────────────────────────────────────┐
        │                             KVStore                              │
        │                 (include/kvstore.h, src/kvstore.cpp)              │
        │─────────────────────────────────────────────────────────────────│
        │ Data plane (hot path):                                            │
        │   - in-memory index: unordered_map<string,string>                 │
        │   - concurrency: std::shared_mutex (readers shared / writers excl)│
        │                                                                   │
        │ Durability plane:                                                 │
        │   - WAL enqueue (write intent) → TaskQueue                         │
        │   - flush barrier: flush_wal() waits for worker catch-up           │
        │   - checkpoint(): flush → snapshot.tmp → rename → truncate WAL     │
        │                                                                   │
        │ Observability:                                                    │
        │   - stats(): op counters + WAL progress + ckpt state               │
        └───────────────┬───────────────────────────┬──────────────────────┘
                        │ enqueue WAL records        │ background maintenance
                        │ (SET/DEL/CLEAR)            │
                        ▼                           ▼
              ┌───────────────────┐        ┌──────────────────────────────┐
              │   TaskQueue (MPSC)│        │  Background checkpoint thread │
              │ mutex+cv + drain  │        │  (autockpt <ms>)              │
              └──────────┬────────┘        └───────────────┬──────────────┘
                         │ drain + append                   │ calls checkpoint()
                         ▼                                  ▼
              ┌───────────────────┐            ┌───────────────────────────┐
              │   WAL worker thread│            │ snapshot writer (same proc)│
              │ append + flush      │            │  snapshot.tmp → rename     │
              └──────────┬─────────┘            └──────────────┬────────────┘
                         │                                      │
                         ▼                                      ▼
                    sentry.wal                            snapshot.dat

             NOTE: KVStore never prints; it returns values/status and the CLI formats output.

Detailed architecture diagrams and state machines:

  • docs/diagrams.md
  • DESIGN.md

Write path (SET) — sequence view

CLI             KVStore                                  WAL worker               Disk
 | set k v  →     | (unique_lock) mutate in-memory map                              |
 |                | enqueue WAL record (SET k v) ────────────────┐                  |
 |                | return OK / ack to CLI                       │                  |
 |                |                                             │                  |
 |                |                               drain queue   │                  |
 |                |                            append record → sentry.wal  ───────→ |
 | flush_wal →     | wait until wal_flushed >= wal_enqueued ─────┘                  |

Startup recovery — snapshot + redo-WAL

open(snapshot, wal)
  1) load snapshot.dat into temp map
  2) if parsing succeeds → commit temp map
  3) replay sentry.wal (redo SET/DEL/CLEAR)
  4) ready to serve

Failure handling principle:
  - corrupted snapshot/WAL never partially applies state
  - open() fails loudly rather than returning a half-valid store

Key design boundary: KVStore never prints. It returns status/values; the CLI formats output. This keeps the core engine testable and reusable.

Durability model (what is guaranteed)

SentryKV uses a snapshot + redo-WAL model.

Snapshot

A snapshot is the full KV state serialized to disk.

Write path is crash-safe:

  1. write to snapshot.tmp
  2. close/flush
  3. atomic rename snapshot.tmp → snapshot.dat

Load path is corruption-safe:

  1. parse into a temporary map
  2. only commit if the entire file is valid

This ensures no partially loaded state is ever observable.

WAL (Write-Ahead Log)

The WAL is an append-only file that logs write operations (SET/DEL/CLEAR). On startup:

  1. load snapshot (if present)
  2. replay WAL (redo)

Redo replay is safe because operations are idempotent at the KV level (re-applying the same SET/DEL leads to the same final state).

Checkpoint

Checkpoint is the durability “bridge”:

  1. wait until WAL is flushed (flush barrier)
  2. write snapshot of a consistent view
  3. truncate WAL only after snapshot is safely committed

In addition, checkpointing is implemented to avoid blocking readers during disk I/O:

  • take a short exclusive lock to copy state / capture a watermark
  • release the lock for the expensive work (flush + snapshot)
  • safely finalize/truncate
  • bounded retries if concurrent writes race the checkpoint

Background checkpointing

autockpt <ms> starts a helper thread that periodically calls checkpoint().

This is best-effort durability: it reduces recovery time and WAL growth under sustained write load without impacting read throughput.

Concurrency model

  • Read-only operations (get/exists/size/keys/scan/save) acquire std::shared_lock.
  • Write operations (set/del/clear/load/open/checkpoint) acquire std::unique_lock.

WAL writes are performed by a dedicated worker thread:

  • producers enqueue records
  • the worker appends to disk
  • flush_wal() provides a barrier (tests and checkpoints rely on it)

I validated concurrency correctness with TSan and included tests that:

  • stress many concurrent readers
  • ensure a writer can still make progress under heavy read load
  • reproduce and fix race patterns cleanly

Reliability testing

Beyond functional tests, the test suite includes failure-injection style checks:

  • snapshot corruption: malformed line, invalid escape sequence
  • WAL corruption: unknown record, truncated record
  • restart equivalence: checkpoint + more writes + restart recovers from snapshot + WAL

The principle is simple: never partially apply corrupted persistence.

CLI walkthrough

Typical session:

help
open snapshot.dat sentry.wal
autockpt 2000
set user:1 alice
set user:2 bob
get user:1
stats
status
checkpoint
autockpt off
quit

What happens under the hood:

  • each set/del/clear enqueues a WAL record + mutates memory
  • WAL worker drains the queue and appends to sentry.wal
  • checkpoint (manual or background) waits for WAL flush, writes snapshot.tmp, renames to snapshot.dat, then truncates WAL
  • on restart, KVStore loads snapshot.dat then replays sentry.wal

Build & run

In CLion

Open the project and use the provided CMake profiles.

Recommended development loop:

  • Debug profile for fast iteration
  • TSan profile when adding/changing concurrency code
  • ASan/UBSan profiles when modifying parsing/persistence code
  • Coverage profile to ensure tests actually exercise new logic

From terminal

(Use your build directory names; CLion typically generates cmake-build-*.)

Build the CLI binary:

cmake --build cmake-build-debug --target sentrykv -j 8
./cmake-build-debug/sentrykv

Build and run tests:

cmake --build cmake-build-debug --target kvstore_tests -j 8
./cmake-build-debug/kvstore_tests

Sanitizers

Run tests under TSan/ASan/UBSan by building the corresponding CMake profile targets and executing kvstore_tests in that build directory.

Coverage (LLVM)

If you are using the LLVM coverage instrumentation profile:

  1. run tests to produce default.profraw
  2. merge:
xcrun llvm-profdata merge -sparse default.profraw -o coverage.profdata
  1. show a report (example for tests binary):
xcrun llvm-cov report ./kvstore_tests -instr-profile=coverage.profdata

(Exact paths depend on your build directory.)

What’s in the repo

  • src/main.cpp — CLI/REPL
  • src/kvstore.cpp, include/kvstore.h — storage engine
  • src/task_queue.cpp, include/task_queue.h — producer/consumer queue used by WAL
  • tests/test_kvstore.cpp — unit tests + failure injection + concurrency tests

What I commit vs what I ignore

I commit source, tests, and docs. I do not commit generated build directories.

Suggested .gitignore entries:

  • cmake-build-*/
  • *.profraw, *.profdata
  • *.gcda, *.gcno
  • *.tmp, *.wal, snapshot.dat
  • .DS_Store

Design notes

Deep technical reasoning lives in DESIGN.md (snapshot format, WAL replay model, checkpoint algorithm, concurrency/locking rules, and testing strategy).


If you want the 2-minute interview pitch: “I implemented a crash-safe KV store with snapshot+redo-WAL recovery, checkpointing with flush barriers, and a reader-optimized concurrency model validated with TSan and corruption tests.”

About

Crash-safe concurrent key-value store in modern C++ (snapshots + redo-WAL + checkpointing + sanitizer-verified concurrency)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors