wasm-kv

An LSM-tree based key-value store built from scratch in Rust. Built as a learning project to deeply understand how storage engines work — from WAL to compaction — with a future path toward WASM compatibility.

Overview

wasm-kv is a single-node, single-writer embedded key-value store that implements the core components of a Log-Structured Merge Tree (LSM-tree):

Write path: Writes go through a Write-Ahead Log for durability, then into an in-memory SkipMap, which flushes to sorted on-disk SSTables when a size threshold is reached.
Read path: Reads check the in-memory MemTable first, then search through leveled SSTables using bloom filters and sparse indexes to minimize disk I/O.

Architecture

                          WRITE PATH
                          ─────────
    put(key, value)
          │
          ▼
    ┌───────────┐     Append record with
    │    WAL    │◄─── CRC32 checksum for
    │ (durability)│    crash recovery
    └─────┬─────┘
          │
          ▼
    ┌───────────┐     crossbeam SkipMap
    │ MemTable  │◄─── (sorted, concurrent)
    └─────┬─────┘
          │  threshold reached (400KB)
          ▼
    ┌───────────┐     Flush via crossbeam
    │  Flush    │◄─── channel to background
    │ (async)   │     watcher thread
    └─────┬─────┘
          │
          ▼
    ┌───────────────────────────────────┐
    │           SSTable File            │
    │  ┌───────┬───────┬───────┐       │
    │  │Block 0│Block 1│Block N│ 4KB   │
    │  └───────┴───────┴───────┘       │
    │  ┌─────────────────────┐         │
    │  │    Sparse Index     │         │
    │  └─────────────────────┘         │
    │  ┌─────────────────────┐         │
    │  │    Bloom Filter     │         │
    │  └─────────────────────┘         │
    │  ┌─────────────────────┐         │
    │  │      Footer         │         │
    │  └─────────────────────┘         │
    └───────────────────────────────────┘
          │
          ▼
    ┌───────────┐     Tracks SSTable files
    │ Manifest  │◄─── per level
    └───────────┘


                          READ PATH
                          ─────────
    get(key)
      │
      ▼
    MemTable ──found──▶ return value
      │ miss
      ▼
    Bloom Filter ──negative──▶ skip file
      │ maybe
      ▼
    Sparse Index ──locate──▶ target block
      │
      ▼
    Block scan ──found──▶ return value
      │ miss
      ▼
    Next SSTable... ──▶ repeat per level

Features

Write-Ahead Log (WAL) — Append-only log with CRC32 checksums, WAL rotation on flush, and archival for recovery
MemTable — Lock-free concurrent SkipMap (crossbeam-skiplist) with sorted key ordering
SSTable — Immutable on-disk sorted tables with:
- Fixed-size 4KB blocks
- Sparse index for block-level binary search
- Bloom filters for fast negative lookups
- Footer with offset metadata and checksums
Leveled Storage — Multi-level SSTable organization (L0, L1, L2...) with manifest tracking
Concurrent Flush — Channel-based background flush watcher thread decoupled from the write path
Tombstone Deletes — Logical deletes via RecordType::Delete markers, cleaned up during compaction
Crash Recovery — WAL replay on startup to rebuild MemTable state
CLI — Configurable via clap with log level and data directory options

Project Structure

src/
├── main.rs                 # CLI entry point
├── lib.rs                  # Public crate API
├── config.rs               # CLI config (clap)
├── error.rs                # Error types (thiserror)
├── api/
│   ├── mod.rs
│   └── api.rs              # KVEngine trait definition
└── storage/
    ├── mod.rs
    ├── kv.rs               # PersistentKV — core engine implementation
    ├── wal.rs               # WAL header, record encoding/decoding
    ├── log.rs               # Record format, WAL I/O, SSTable search
    ├── record.rs            # RecordType (Put/Delete)
    ├── block.rs             # BlockBuilder — 4KB block encoding
    ├── bloom.rs             # Bloom filter implementation
    ├── sstable.rs           # SSTable encode/decode, flush, k-way merge
    ├── manifest.rs          # Manifest file tracking (level → files)
    ├── levelstore.rs        # Level store data structure
    ├── constant.rs          # Thresholds and magic numbers
    ├── signal.rs            # FlushSignal struct
    ├── skiplist.rs          # SkipList utilities
    └── watcher.rs           # Background flush watcher thread

Getting Started

Prerequisites

Rust (edition 2024)

Build

cargo build

Run

# Default (info logging)
cargo run

# With debug logging
cargo run -- --verbose

# Custom log level and data directory
cargo run -- --log-level trace --data-dir ./mydata

Test

cargo test --verbose

Heap Profiling

cargo run --features dhat-heap

API

The core interface is the KVEngine trait:

pub trait KVEngine {
    fn get(&self, key: &[u8]) -> Result<Option<Cow<'_, Vec<u8>>>, DBError>;
    fn put(&mut self, key: Vec<u8>, value: Vec<u8>) -> Result<(), DBError>;
    fn delete(&mut self, key: &[u8]);
}

Usage

use wasm_kv::PersistentKV;
use wasm_kv::api::api::KVEngine;

let mut kv = PersistentKV::new();

// Put
kv.put(b"hello".to_vec(), b"world".to_vec()).unwrap();

// Get
let value = kv.get(b"hello").unwrap();
assert_eq!(value.unwrap().as_slice(), b"world");

// Delete
kv.delete(b"hello");
assert!(kv.get(b"hello").unwrap().is_none());

Storage Format

WAL Record

Each WAL entry is appended with the following binary layout:

┌──────────┬───────────┬──────┬──────┬─────────────┬──────────┐
│ key_len  │ value_len │ key  │value │ record_type │ checksum │
│ (4 bytes)│ (4 bytes) │ (var)│(var) │  (1 byte)   │(4 bytes) │
└──────────┴───────────┴──────┴──────┴─────────────┴──────────┘

The WAL file begins with a 32-byte header containing a magic number (WALMGIC\0), version, and checkpoint metadata.

SSTable Layout

Each SSTable file is structured as:

┌─────────────────────────────────────────┐
│  Data Blocks (N x 4KB blocks)           │
│  ┌─────────────────────────────────┐    │
│  │ Record: key_len|val_len|key|val │    │
│  │ Record: ...                     │    │
│  └─────────────────────────────────┘    │
├─────────────────────────────────────────┤
│  Sparse Index                           │
│  ┌─────────────────────────────────┐    │
│  │ first_key | block_offset | ...  │    │
│  └─────────────────────────────────┘    │
├─────────────────────────────────────────┤
│  Bloom Filter (serialized bit vector)   │
├─────────────────────────────────────────┤
│  Footer                                 │
│  ┌─────────────────────────────────┐    │
│  │ data_block_start/end            │    │
│  │ index_block_start/end           │    │
│  │ bloom_block_start/end           │    │
│  │ index_checksum | bloom_checksum │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Read flow: Footer is read first (fixed size at end of file) to locate the sparse index and bloom filter offsets. The bloom filter provides fast negative lookups, the sparse index locates the target block, and then the block is scanned linearly.

Learning Resources

This project follows a phased learning roadmap (see PLAN.md) covering:

Binary encoding and on-disk record formats
Write-Ahead Logging and crash recovery
LSM-tree architecture (MemTable, SSTable, compaction)
Bloom filters and sparse indexing
Concurrency patterns (channels, RwLock, atomic operations)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
PLAN.md		PLAN.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wasm-kv

Overview

Architecture

Features

Project Structure

Getting Started

Prerequisites

Build

Run

Test

Heap Profiling

API

Usage

Storage Format

WAL Record

SSTable Layout

Learning Resources

Further Reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wasm-kv

Overview

Architecture

Features

Project Structure

Getting Started

Prerequisites

Build

Run

Test

Heap Profiling

API

Usage

Storage Format

WAL Record

SSTable Layout

Learning Resources

Further Reading

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages