-
Notifications
You must be signed in to change notification settings - Fork 24
Overview
EloqStore is a hybrid-tier key-value storage engine that combines NVMe SSDs with S3-compatible object storage to deliver memory-like latency at disk economics. Built in modern C++, it underpins EloqData products (EloqKV, EloqDoc, EloqSQL) by providing predictable performance, copy-on-write semantics, and cloud-native durability.
- Multi-tier design: hot in-memory B-tree index plus a warm local SSD tier; when object storage is configured it becomes the durable cold tier while local disks act as a cache. Non-leaf nodes remain resident to guarantee O(log n) point lookups.
- Copy-on-write B-tree: writers update cloned tree fragments, so reads stay lock-free and tail latencies remain flat even under heavy batch ingest.
- Async IO via io_uring: the engine targets Linux 6.8+ and relies on io_uring for low-overhead, high-parallelism disk access.
- Coroutine scheduler: user-space coroutines multiplex read/write tasks on shard threads to avoid expensive kernel context switches while keeping the pipeline busy.
- Object storage first: append-optimized data files stream to S3/MinIO-compatible backends with intelligent caching, archive snapshots, and tier-aware garbage collection.
EloqStore lives beneath EloqData's Data Substrate, which owns caching, concurrency control, and durability duties for EloqKV/EloqDoc/EloqSQL. Query engines interact only with Data Substrate; EloqStore is invoked exclusively on cache misses or when checkpoint writers flush accumulated changes. It therefore does not implement distributed caching or transaction coordination—its mandate is to serve low-latency miss reads and absorb high-throughput checkpoint batches without perturbing readers. When cloud/object storage is enabled it becomes the source of truth with local NVMe acting as a cache/staging tier; in local-only deployments the NVMe-resident data is the durable store.
- NVMe-first: prioritize local NVMe bandwidth/IOPS and treat DRAM as a cache that already lives in Data Substrate. EloqStore leans on SSD-friendly B+-trees with predictable height and one I/O per lookup.
- Predictable tail latency: a copy-on-write B-tree, cached inner nodes, and coroutine-driven io_uring minimize jitter when servicing cache-miss reads.
- Cost efficiency: couple commodity NVMe (high performance, ephemeral) with object storage (durable, cheaper per TB) instead of relying on DRAM-heavy deployment or premium block storage.
- Simplicity over abstraction: a single writer model, explicit batch APIs, and well-defined maintenance jobs keep the engine understandable and debuggable.
- Cloud-native: failure is expected; local NVMe disappears with the VM. EloqStore assumes cold tiers live in object storage and that instances are reprovisioned quickly.
- Modern servers expose high-IOPS NVMe devices and io_uring-capable kernels.
- Data Substrate absorbs transactional writes and hot reads; EloqStore mainly handles batch flushes and cold cache misses.
- Node crashes lose locally attached NVMe, so durability comes from object storage or replicated logs outside EloqStore.
- Deep dives into data structures—the focus here is architectural intent, not page layouts or compaction internals.
- Benchmark tables or micro-bench numbers (they belong in dedicated perf reports).
- API minutiae—the API Usage Guide covers request semantics separately.
- Batch write pipeline with strict key ordering, caller-managed timestamps, and TTL-aware entries to support high-throughput ingestion.
- Single-I/O point reads thanks to cached upper B-tree layers and deterministic page layout.
- Zero-copy snapshots & agent branching via copy-on-write manifests, enabling instant dataset forks for AI experimentation or multi-tenant isolation.
- Maintenance suite (archive, compact, clean-expired) to keep append-only storage healthy in both local and cloud deployments.
- Shard-per-core execution: partitions hash onto shards that each pin to a core, eliminating thread-pool tuning. Coroutine context switches are user-space and significantly cheaper than thread switches, so EloqStore sustains high NVMe IOPS with fewer CPU cycles and without workload-specific tuning—even if a carefully tuned thread pool could approach similar throughput.
- Multi-path storage layout: a single instance can stripe data across multiple local paths automatically; in cloud/object storage mode it partitions files under different prefixes, making it easy to assign ranges to distinct nodes and scale out horizontally.
-
APIs: Native C++ interface (see
eloqstore/include/eloq_store.hand the API Usage Guide) with sync/async requests and INI-based configuration (benchmark/opts_append.ini). -
SDKs: Language-specific SDKs available:
- Rust SDK - Quick Start | Usage Guide
-
Examples & Benchmarks:
examples/basic_example.cpp,benchmark/simple_bench, andmicro-bench/provide ready-to-run workloads for validation and tuning. - Downstream integrations: EloqStore powers Redis/Valkey-compatible services, MongoDB-wire compatible document stores, relational SQL fronts, and native vector search modules.
Contents
- EloqStore Wiki
- Overview
- Quick Start
- EloqStore FAQ
- API
- EloqStoreRust