SentryKV

A crash-safe, concurrent, persistent key-value store

SentryKV is a crash-safe, concurrent, persistent key-value store built in modern C++.

I built this as a systems exercise with a production mindset: correctness first, explicit durability semantics, observable behavior, and verification using tests + sanitizers. The result is not “just a map wrapper” — it’s a small storage engine that makes its guarantees precise.

Why SentryKV?

Most "KV store exercises" stop at a map with persistence. SentryKV goes further:

Explicit crash-consistency model
Background checkpointing with ordering guarantees
Corruption detection and fail-fast recovery
Concurrency validated under sanitizers
Observable durability state (stats + status)

The goal was not features — it was correctness under failure.

Guarantees

No partially applied snapshot state is ever observable.
WAL replay is idempotent and deterministic.
Readers never observe partially written in-memory state.
Corrupted persistence files fail fast.

Non-Goals (for now)

Distributed replication
Sharding
Lock-free data structures
fsync on every write by default

Performance Model

Reads are in-memory and scale with core count using shared locks.
Writes enqueue WAL records asynchronously.
Durability barriers (flush_wal / checkpoint) are explicit.
Throughput is bounded by disk latency only when durability barriers are required.

What SentryKV can do

Core KV API

set <key> <value> — insert/overwrite
get <key> — fetch (prints value or “not found”)
del <key> — delete
exists <key> — membership test
size / clear
keys — list keys (deterministic order)
scan <prefix> — prefix scan (deterministic order)

Durability

Snapshots: persistent point-in-time copy of the entire KV state
WAL (Write-Ahead Log): append-only log of write operations for recovery between snapshots
Checkpointing: flush WAL → write snapshot → safely truncate WAL
Background checkpointing: periodic checkpoints via a helper thread (autockpt <ms>)

Concurrency

Thread-safe KVStore using a reader/writer lock (std::shared_mutex)
Asynchronous WAL writer using a producer–consumer queue (mutex + condition_variable)
Concurrency and races verified with ThreadSanitizer (TSan)

Observability

stats — operation counters + WAL enqueue/flush progress
status — WAL enabled? background checkpoint running? checkpoint period?

Architecture (at a glance)

Components + threads

                           ┌───────────────────────────────┐
                           │           CLI / REPL           │
                           │   parse → validate → dispatch  │
                           │        (src/main.cpp)          │
                           └───────────────┬───────────────┘
                                           │ calls
                                           ▼
        ┌─────────────────────────────────────────────────────────────────┐
        │                             KVStore                              │
        │                 (include/kvstore.h, src/kvstore.cpp)              │
        │─────────────────────────────────────────────────────────────────│
        │ Data plane (hot path):                                            │
        │   - in-memory index: unordered_map<string,string>                 │
        │   - concurrency: std::shared_mutex (readers shared / writers excl)│
        │                                                                   │
        │ Durability plane:                                                 │
        │   - WAL enqueue (write intent) → TaskQueue                         │
        │   - flush barrier: flush_wal() waits for worker catch-up           │
        │   - checkpoint(): flush → snapshot.tmp → rename → truncate WAL     │
        │                                                                   │
        │ Observability:                                                    │
        │   - stats(): op counters + WAL progress + ckpt state               │
        └───────────────┬───────────────────────────┬──────────────────────┘
                        │ enqueue WAL records        │ background maintenance
                        │ (SET/DEL/CLEAR)            │
                        ▼                           ▼
              ┌───────────────────┐        ┌──────────────────────────────┐
              │   TaskQueue (MPSC)│        │  Background checkpoint thread │
              │ mutex+cv + drain  │        │  (autockpt <ms>)              │
              └──────────┬────────┘        └───────────────┬──────────────┘
                         │ drain + append                   │ calls checkpoint()
                         ▼                                  ▼
              ┌───────────────────┐            ┌───────────────────────────┐
              │   WAL worker thread│            │ snapshot writer (same proc)│
              │ append + flush      │            │  snapshot.tmp → rename     │
              └──────────┬─────────┘            └──────────────┬────────────┘
                         │                                      │
                         ▼                                      ▼
                    sentry.wal                            snapshot.dat

             NOTE: KVStore never prints; it returns values/status and the CLI formats output.

Detailed architecture diagrams and state machines:

docs/diagrams.md
DESIGN.md

Write path (SET) — sequence view

CLI             KVStore                                  WAL worker               Disk
 | set k v  →     | (unique_lock) mutate in-memory map                              |
 |                | enqueue WAL record (SET k v) ────────────────┐                  |
 |                | return OK / ack to CLI                       │                  |
 |                |                                             │                  |
 |                |                               drain queue   │                  |
 |                |                            append record → sentry.wal  ───────→ |
 | flush_wal →     | wait until wal_flushed >= wal_enqueued ─────┘                  |

Startup recovery — snapshot + redo-WAL

open(snapshot, wal)
  1) load snapshot.dat into temp map
  2) if parsing succeeds → commit temp map
  3) replay sentry.wal (redo SET/DEL/CLEAR)
  4) ready to serve

Failure handling principle:
  - corrupted snapshot/WAL never partially applies state
  - open() fails loudly rather than returning a half-valid store

Key design boundary: KVStore never prints. It returns status/values; the CLI formats output. This keeps the core engine testable and reusable.

Durability model (what is guaranteed)

SentryKV uses a snapshot + redo-WAL model.

Snapshot

A snapshot is the full KV state serialized to disk.

Write path is crash-safe:

write to snapshot.tmp
close/flush
atomic rename snapshot.tmp → snapshot.dat

Load path is corruption-safe:

parse into a temporary map
only commit if the entire file is valid

This ensures no partially loaded state is ever observable.

WAL (Write-Ahead Log)

The WAL is an append-only file that logs write operations (SET/DEL/CLEAR). On startup:

load snapshot (if present)
replay WAL (redo)

Redo replay is safe because operations are idempotent at the KV level (re-applying the same SET/DEL leads to the same final state).

Checkpoint

Checkpoint is the durability “bridge”:

wait until WAL is flushed (flush barrier)
write snapshot of a consistent view
truncate WAL only after snapshot is safely committed

In addition, checkpointing is implemented to avoid blocking readers during disk I/O:

take a short exclusive lock to copy state / capture a watermark
release the lock for the expensive work (flush + snapshot)
safely finalize/truncate
bounded retries if concurrent writes race the checkpoint

Background checkpointing

autockpt <ms> starts a helper thread that periodically calls checkpoint().

This is best-effort durability: it reduces recovery time and WAL growth under sustained write load without impacting read throughput.

Concurrency model

Read-only operations (get/exists/size/keys/scan/save) acquire std::shared_lock.
Write operations (set/del/clear/load/open/checkpoint) acquire std::unique_lock.

WAL writes are performed by a dedicated worker thread:

producers enqueue records
the worker appends to disk
flush_wal() provides a barrier (tests and checkpoints rely on it)

I validated concurrency correctness with TSan and included tests that:

stress many concurrent readers
ensure a writer can still make progress under heavy read load
reproduce and fix race patterns cleanly

Reliability testing

Beyond functional tests, the test suite includes failure-injection style checks:

snapshot corruption: malformed line, invalid escape sequence
WAL corruption: unknown record, truncated record
restart equivalence: checkpoint + more writes + restart recovers from snapshot + WAL

The principle is simple: never partially apply corrupted persistence.

CLI walkthrough

Typical session:

help
open snapshot.dat sentry.wal
autockpt 2000
set user:1 alice
set user:2 bob
get user:1
stats
status
checkpoint
autockpt off
quit

What happens under the hood:

each set/del/clear enqueues a WAL record + mutates memory
WAL worker drains the queue and appends to sentry.wal
checkpoint (manual or background) waits for WAL flush, writes snapshot.tmp, renames to snapshot.dat, then truncates WAL
on restart, KVStore loads snapshot.dat then replays sentry.wal

Build & run

In CLion

Open the project and use the provided CMake profiles.

Recommended development loop:

Debug profile for fast iteration
TSan profile when adding/changing concurrency code
ASan/UBSan profiles when modifying parsing/persistence code
Coverage profile to ensure tests actually exercise new logic

From terminal

(Use your build directory names; CLion typically generates cmake-build-*.)

Build the CLI binary:

cmake --build cmake-build-debug --target sentrykv -j 8
./cmake-build-debug/sentrykv

Build and run tests:

cmake --build cmake-build-debug --target kvstore_tests -j 8
./cmake-build-debug/kvstore_tests

Sanitizers

Run tests under TSan/ASan/UBSan by building the corresponding CMake profile targets and executing kvstore_tests in that build directory.

Coverage (LLVM)

If you are using the LLVM coverage instrumentation profile:

run tests to produce default.profraw
merge:

xcrun llvm-profdata merge -sparse default.profraw -o coverage.profdata

show a report (example for tests binary):

xcrun llvm-cov report ./kvstore_tests -instr-profile=coverage.profdata

(Exact paths depend on your build directory.)

What’s in the repo

src/main.cpp — CLI/REPL
src/kvstore.cpp, include/kvstore.h — storage engine
src/task_queue.cpp, include/task_queue.h — producer/consumer queue used by WAL
tests/test_kvstore.cpp — unit tests + failure injection + concurrency tests

What I commit vs what I ignore

I commit source, tests, and docs. I do not commit generated build directories.

Suggested .gitignore entries:

cmake-build-*/
*.profraw, *.profdata
*.gcda, *.gcno
*.tmp, *.wal, snapshot.dat
.DS_Store

Design notes

Deep technical reasoning lives in DESIGN.md (snapshot format, WAL replay model, checkpoint algorithm, concurrency/locking rules, and testing strategy).

If you want the 2-minute interview pitch: “I implemented a crash-safe KV store with snapshot+redo-WAL recovery, checkpointing with flush barriers, and a reader-optimized concurrency model validated with TSan and corruption tests.”

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SentryKV

Why SentryKV?

Guarantees

Non-Goals (for now)

Performance Model

What SentryKV can do

Architecture (at a glance)

Components + threads

Write path (SET) — sequence view

Startup recovery — snapshot + redo-WAL

Durability model (what is guaranteed)

Snapshot

WAL (Write-Ahead Log)

Checkpoint

Background checkpointing

Concurrency model

Reliability testing

CLI walkthrough

Build & run

In CLion

From terminal

Sanitizers

Coverage (LLVM)

What’s in the repo

What I commit vs what I ignore

Design notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
include		include
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SentryKV

Why SentryKV?

Guarantees

Non-Goals (for now)

Performance Model

What SentryKV can do

Architecture (at a glance)

Components + threads

Write path (SET) — sequence view

Startup recovery — snapshot + redo-WAL

Durability model (what is guaranteed)

Snapshot

WAL (Write-Ahead Log)

Checkpoint

Background checkpointing

Concurrency model

Reliability testing

CLI walkthrough

Build & run

In CLion

From terminal

Sanitizers

Coverage (LLVM)

What’s in the repo

What I commit vs what I ignore

Design notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages