SochDB Benchmarks

This repository contains reproducible benchmarks comparing SochDB against other vector stores and agent memory systems.

📊 See Published Results - Comprehensive benchmark findings with real LLM integration

Overview

We provide benchmarks across different dimensions:

Vector search scaling (O(n) vs O(log n))
Real LLM integration (actual Azure OpenAI calls)
Multi-system comparison (SochDB vs ChromaDB, Zep framework ready)
Production-grade framework (2000+ lines, fully reproducible)

Performance Snapshot

Scenario: 10,000 vectors, 128-dimensions, running on local hardware.

Database	Insert Rate	Search Latency (Avg)	Storage Engine	Primary Use Case
SochDB	~2,377 vec/s	0.325 ms	In-Memory (Rust) + WAL	Low-Latency Search, Agent Memory
LanceDB	96,852 vec/s	4.07 ms	Disk-Based (Lance)	Large Datasets, High-Throughput Ingestion
ChromaDB	~10,500 vec/s	0.69 ms	In-Memory / SQLite	General Purpose RAG, Prototyping
DuckDB	~3,900 vec/s	0.90 ms	OLAP + VSS	Analytical + Vector Search Hybrid
NumPy	N/A	0.62 ms	In-Memory (Exact)	Baseline comparison

Benchmark Run (OpenAI Codex Environment)

Run Date (UTC): 2026-01-04 03:51 UTC
Host: Linux 6.12.13 (x86_64, KVM)
CPU: Intel(R) Xeon(R) Platinum 8370C @ 2.80GHz (3 vCPU)
Command:

SOCHDB_LIB_PATH=/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/sochdb/lib/x86_64-unknown-linux-gnu/libsochdb_index.so \
  python3 benchmarks/comprehensive_benchmark.py

Standard Benchmark (10K vectors, 128-dim)

System	Insert (vec/s)	Search (ms avg)	Notes
NumPy (brute-force)	N/A	0.359	Baseline
ChromaDB	3,925	1.548	—
LanceDB	18,245	14.039	IVF-PQ index
SochDB	2,442	0.580	Rust HNSW via Python FFI

Multi-Configuration (SochDB)

Config	Insert (vec/s)	Search (ms avg)
1,000 × 128	6,455	0.250
10,000 × 128	2,240	0.567
10,000 × 384	371	3.634
10,000 × 768	1,334	1.197

Errors / Limitations

SQLite-VSS: sqlite3.Connection in this environment does not expose enable_load_extension, so the benchmark could not load the extension.
DuckDB VSS: Extension download failed (no access to extensions.duckdb.org from the container).

🏗️ Systems Engineering Evaluation

Beyond microbenchmarks, we stress-tested SochDB's "Actual" production capability for Agentic workloads.

1. The "Agent Loop" Macrobenchmark

We simulated a long-running agent conversation where the system must simultaneously Write new observations and Read/Assemble context for a prompt.

Metric (P99 Latency)	SochDB (Unified)	SQLite + Chroma (Fragmented)	Improvement
Write (Append)	0.01 ms	2.80 ms	280x Faster
Read (Context)	0.01 ms	3.06 ms	300x Faster

Why This Matters: SochDB acts as an integrated memory layer. The "Fragmented" baseline requires network/IPC hops between Python, SQLite, and Chroma. SochDB keeps the "Thought Loop" tight.

2. Transactional Integrity (Crash Test)

We subjected SochDB to a "Jepsen-lite" test: heavily writing to a key and randomly force-killing the process (kill -9).

Result: ✅ PASSED
Recovery Time: 4.31 ms
Consistency: No data corruption; WAL successfully replayed last committed transaction.

3. Hardware Efficiency (Microbenchmark)

We isolated the cosine distance kernel to check SIMD usage on ARM (Apple M1 Max).

Finding: Raw kernel throughput via FFI is lower than NumPy (0.08x) due to Python<->Rust boundary overhead on single queries.
Verdict: SochDB is optimal for Search (where work stays in Rust) but has high overhead for basic vector math ops in Python compared to highly optimized BLAS.

🤖 Agent Memory Systems Benchmark

1. Pure Vector Search Scaling: O(n) vs O(log n)

The Problem: Brute-force O(n) vector search causes P99 latency to degrade 34x as observations grow from 40 → 2000. The Solution: HNSW O(log n) search keeps degradation minimal (5.9x) while delivering 11x better performance at scale.

This benchmark uses pre-generated random embeddings to isolate pure vector search performance (no LLM API overhead).

Results: Brute-Force vs HNSW

Scale	Brute-Force P99	HNSW P99	Speedup
40 observations	0.26ms	0.14ms	1.9x
100 observations	0.71ms	0.20ms	3.6x
200 observations	0.90ms	0.36ms	2.5x
500 observations	2.98ms	0.49ms	6.1x
1,000 observations	6.92ms	0.86ms	8.0x
2,000 observations	9.06ms	0.81ms	11.2x

Scaling Analysis:

Brute-Force (40 → 2000): P99 degrades 0.26ms → 9.06ms (34x worse)
HNSW (40 → 2000): P99 degrades 0.14ms → 0.81ms (5.9x - much better!)
At 200 observations: The crossover point where HNSW becomes clearly superior
At 2000 observations: HNSW is 11.2x faster than brute-force

Run the benchmark:

export SOCHDB_LIB_PATH=/path/to/libsochdb_index.so
python3 benchmarks/pure_search_scale_benchmark.py

2. Real-World LLM Integration Test

Inspired by the Zep vs Mem0 controversy

This benchmark uses actual Azure OpenAI embedding calls to test memory systems in realistic agent conversation scenarios.

Test Configuration

Conversations: 8 multi-turn dialogues (customer support, technical support, product inquiries)
Messages: 65 total messages stored as memories
Test Queries: 200 queries to test memory recall
Embeddings: Azure OpenAI text-embedding-3-small (1536-dim)
Date: 2026-01-04

Results: SochDB vs ChromaDB

System	Insert (avg)	p50 Latency	p95 Latency	p99 Latency	Context Size
SochDB	94.20ms	79.49ms	172.64ms	2557.91ms	36 tokens
ChromaDB	184.90ms	82.80ms	123.00ms	1338.15ms	36 tokens

Key Findings:

SochDB is 1.96x faster at insert (94ms vs 185ms)
ChromaDB has better p95/p99 consistency (123ms vs 173ms p95)
Both systems delivered identical context quality (36 tokens avg)
Real embedding overhead dominates: 70-90% of latency is Azure OpenAI API calls, not DB operations

Why This Matters

Unlike synthetic vector benchmarks, this test measures:

Real LLM integration overhead - actual API calls to Azure OpenAI
Multi-turn conversation memory - realistic agent dialogue patterns
Production-like workloads - insert + search in realistic sequences
Context assembly - how fast systems retrieve and build context for LLM prompts

Run the benchmark:

export SOCHDB_LIB_PATH=/path/to/libsochdb_index.so
python3 benchmarks/memory_systems_comparison.py

detailed Comparison

SochDB

Performance Profile: Optimized for low-latency search (0.33ms) and fast inserts for agent memory.
Architecture: In-memory HNSW index with Rust core, Python FFI.
Trade-off: Lower ingestion throughput compared to columnar stores on bulk loads.
Best For: Agent memory systems, real-time RAG, low-latency search.

LanceDB

Performance Profile: Optimized for high-throughput ingestion (96k vec/s).
Architecture: Disk-based columnar format (Lance).
Trade-off: Higher search latency for random-access patterns (approx. 4ms).
Best For: Large-scale datasets, batch processing, analytics.

ChromaDB

Performance Profile: Balanced performance for general use cases.
Architecture: Persistent storage with HNSW indexing.
Trade-off: Slower search than SochDB, slower ingestion than LanceDB.
Best For: General-purpose RAG, prototyping, moderate-scale applications.

Verification

Run Environment:

Hardware: Mac Studio (Apple M1 Max, 32GB RAM)
OS: macOS 26.2
Date: January 03, 2026
Command: python3 benchmarks/comprehensive_benchmark.py

Raw Output Log (Excerpt)

Below is the output from the strictly verified benchmark run:

======================================================================
   FINAL SUMMARY
======================================================================

System                    Insert (vec/s)     Search (ms)     Speedup vs NumPy
---------------------------------------------------------------------------
NumPy (brute-force)       N/A                0.619           1.0x (baseline)
ChromaDB                  10558              0.687           0.9x
DuckDB                    3886               0.904           0.7x
LanceDB                   96852              4.074           0.2x
SochDB                    2377               0.325           1.9x

Running the Benchmarks

Quick Start

Install Dependencies:
```
pip install -r requirements.txt
```
Run Comprehensive Suite: Runs all DBs against synthetic data (10k-100k vectors).
```
python3 benchmarks/comprehensive_benchmark.py
```

Run Systems Evaluation:

python3 benchmarks/macro_agent_benchmark.py
python3 benchmarks/crash_test.py

Production-Grade Comparison: SochDB vs Zep

For a comprehensive, apples-to-apples comparison of agent memory systems:

# Set up environment
export AZURE_OPENAI_API_KEY="your_key"
export AZURE_OPENAI_ENDPOINT="your_endpoint"
export SOCHDB_LIB_PATH="/path/to/libsochdb_index.so"

# Optional: Add Zep for comparison
export ZEP_API_URL="http://localhost:8000"
export ZEP_API_KEY="your_zep_key"

# Run comprehensive benchmark
python3 benchmarks/run_memory_comparison.py

What it tests:

Phase 1: Microbenchmarks (latency, throughput)
Phase 2: Token efficiency (context assembly)
Phase 3: LoCoMo quality (QA accuracy)
Phase 4: Scale test (100-2000 observations)

See BENCHMARK_FRAMEWORK_GUIDE.md for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
baselines/mac-studio		baselines/mac-studio
benchmark_results		benchmark_results
benchmarks		benchmarks
datasets		datasets
reports		reports
workloads		workloads
.gitignore		.gitignore
360_BENCHMARK_REPORT.md		360_BENCHMARK_REPORT.md
BACKEND_PROFILING_REPORT.md		BACKEND_PROFILING_REPORT.md
BENCHMARK_FRAMEWORK_GUIDE.md		BENCHMARK_FRAMEWORK_GUIDE.md
BENCHMARK_REPORT.md		BENCHMARK_REPORT.md
BENCHMARK_RESULTS_2024-12-27.md		BENCHMARK_RESULTS_2024-12-27.md
BENCHMARK_SUMMARY.md		BENCHMARK_SUMMARY.md
BENCH_README.md		BENCH_README.md
FINAL_BENCHMARK_REPORT.md		FINAL_BENCHMARK_REPORT.md
LICENSE		LICENSE
MEMORY_SYSTEMS_RESULTS.txt		MEMORY_SYSTEMS_RESULTS.txt
PERFORMANCE_REPORT.md		PERFORMANCE_REPORT.md
PERF_REPORT.md		PERF_REPORT.md
PUBLISHED_RESULTS.md		PUBLISHED_RESULTS.md
README.md		README.md
REAL_EMBEDDING_RESULTS.txt		REAL_EMBEDDING_RESULTS.txt
benchmark_results.json		benchmark_results.json
benchmark_results_20260104_053718.json		benchmark_results_20260104_053718.json
final_benchmark_results.json		final_benchmark_results.json
insert_profile.rs		insert_profile.rs
memory_benchmark_20260104_054222.json		memory_benchmark_20260104_054222.json
memory_benchmark_20260106_000143.json		memory_benchmark_20260106_000143.json
profiling_data.json		profiling_data.json
pure_search_scale_results_20260104_055441.json		pure_search_scale_results_20260104_055441.json
requirements.txt		requirements.txt
results.txt		results.txt
sochdb_vs_chromadb.py		sochdb_vs_chromadb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SochDB Benchmarks

Overview

Performance Snapshot

Benchmark Run (OpenAI Codex Environment)

Standard Benchmark (10K vectors, 128-dim)

Multi-Configuration (SochDB)

Errors / Limitations

🏗️ Systems Engineering Evaluation

1. The "Agent Loop" Macrobenchmark

2. Transactional Integrity (Crash Test)

3. Hardware Efficiency (Microbenchmark)

🤖 Agent Memory Systems Benchmark

1. Pure Vector Search Scaling: O(n) vs O(log n)

Results: Brute-Force vs HNSW

2. Real-World LLM Integration Test

Test Configuration

Results: SochDB vs ChromaDB

Why This Matters

detailed Comparison

SochDB

LanceDB

ChromaDB

Verification

Raw Output Log (Excerpt)

Running the Benchmarks

Quick Start

Production-Grade Comparison: SochDB vs Zep

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

sochdb/sochdb-benchmarks

Folders and files

Latest commit

History

Repository files navigation

SochDB Benchmarks

Overview

Performance Snapshot

Benchmark Run (OpenAI Codex Environment)

Standard Benchmark (10K vectors, 128-dim)

Multi-Configuration (SochDB)

Errors / Limitations

🏗️ Systems Engineering Evaluation

1. The "Agent Loop" Macrobenchmark

2. Transactional Integrity (Crash Test)

3. Hardware Efficiency (Microbenchmark)

🤖 Agent Memory Systems Benchmark

1. Pure Vector Search Scaling: O(n) vs O(log n)

Results: Brute-Force vs HNSW

2. Real-World LLM Integration Test

Test Configuration

Results: SochDB vs ChromaDB

Why This Matters

detailed Comparison

SochDB

LanceDB

ChromaDB

Verification

Raw Output Log (Excerpt)

Running the Benchmarks

Quick Start

Production-Grade Comparison: SochDB vs Zep

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages