Add vector search benchmarks to benchmarking suite by connortsui20 · Pull Request #7399 · vortex-data/vortex

connortsui20 · 2026-04-11T21:33:48Z

Summary

Tracking issue: #7297

Adds a vector-search-bench crate similar to VectorDBBench.

The benchmark brute-forces cosine similarity search over public VectorDBBench datasets (Cohere, OpenAI, etc).

Since Vortex is not a database, we do not measure things like vector inserts and deletes. instead we just measure storage size and throughput for four targets:

Hand-rolled Rust baseline over &[f32]
Uncompressed (canonical) Vortex
Default compressed (which ends up being ALPrd)
TurboQuant (plus Recall@10 for the lossy TurboQuant path).

Every variant goes through a correctness check against the uncompressed scan before timing.

Not sure if this makes sense to have on a PR benchmark yet since this can only happen on a very specific array tree that will have a very specific optimized implementation.

Testing

N/A

) ## Summary Tracking issue: #7297 Optimizes inner product with manual partial sum decomposition (because the compiler can't optimize this because float addition is not associative). Also removes the old benchmarks as they no longer really time the correct thing anymore. The real benchmarks will be finished here: #7399. This change also adds the `vortex-tensor/src/vector_search.rs` file to support that soon. ## Testing N/A Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

codspeed-hq · 2026-04-15T01:55:35Z

Merging this PR will degrade performance by 14.69%

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 1151 untouched benchmarks
🆕 10 new benchmarks
⏩ 1455 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`old_alp_prim_test_between[f32, 32768]`	269.3 µs	223.2 µs	+20.66%
❌	Simulation	`new_alp_prim_test_between[f64, 16384]`	127.7 µs	149.7 µs	-14.69%
🆕	Simulation	`turboquant_decompress_dim128_4bit`	N/A	5.5 ms	N/A
🆕	Simulation	`turboquant_decompress_dim1024_8bit`	N/A	43.9 ms	N/A
🆕	Simulation	`turboquant_decompress_dim768_4bit`	N/A	42.1 ms	N/A
🆕	Simulation	`turboquant_compress_dim128_4bit`	N/A	6.6 ms	N/A
🆕	Simulation	`turboquant_decompress_dim1024_4bit`	N/A	43.9 ms	N/A
🆕	Simulation	`turboquant_compress_dim1024_2bit`	N/A	47.5 ms	N/A
🆕	Simulation	`turboquant_decompress_dim1024_2bit`	N/A	43.9 ms	N/A
🆕	Simulation	`turboquant_compress_dim768_4bit`	N/A	52.1 ms	N/A
🆕	Simulation	`turboquant_compress_dim1024_4bit`	N/A	52.6 ms	N/A
🆕	Simulation	`turboquant_compress_dim1024_8bit`	N/A	62.8 ms	N/A

_{Comparing ct/vector-bench (b6ea352) with develop (4a5b7d7)}

1455 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Useful for callers that want explicit, scheme-by-scheme control over the compressor — for example, the vector-search benchmark wants `empty()` for a vortex-uncompressed flavor and `empty().with_turboquant()` for a TurboQuant-only flavor. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Replaces vortex-bench/src/vector_dataset.rs with a four-module package: catalog.rs the static VectorDataset enum with per-dataset metadata (dim, num_rows, element_ptype, metric, layouts, has_neighbors, has_scalar_labels) layout.rs TrainLayout (Single, SingleShuffled, Partitioned, PartitionedShuffled), LayoutSpec, VectorMetric download.rs URL builders + idempotent download driver returning DatasetPaths { train_files, test, neighbors } paths.rs local cache layout under vortex-bench/data/vector-search/ The catalog now covers all 16 published VectorDBBench corpora — including the partitioned cohere-large-10m, openai-large-5m, bioasq-large-10m, sift-large-50m, and laion-large-100m datasets that the previous single-file catalog couldn't model — and is parameterized over layout so callers can pick the hosted shape per dataset. The example and the (now-stub) vector-search-bench crate are updated to use the new API; the bench is rebuilt from scratch in subsequent commits. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Two flavors: vortex-uncompressed BtrBlocksCompressorBuilder::empty() vortex-turboquant BtrBlocksCompressorBuilder::empty().with_turboquant() The TurboQuant flavor extends the default file ALLOWED_ENCODINGS with the two scalar-fn array IDs the scheme emits (L2Denorm, SorfTransform) so the write strategy will accept the L2Denorm(SorfTransform(...)) tree. Wires the unstable_encodings feature through the bench crate's Cargo.toml so vortex-btrblocks::with_turboquant is available. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Pieces added: vortex-bench/src/conversions.rs::write_parquet_as_vortex_with_options streams a parquet file into a Vortex file using caller-provided VortexWriteOptions src/session.rs process-wide VortexSession with the tensor scalar-fn array plugins registered (env-gated) src/paths.rs per-flavor vortex path translator src/ingest.rs per-chunk transform: project emb, wrap as Extension<Vector<f32>>, lossy cast f64→f32, optional scalar_labels passthrough src/prepare.rs per-flavor driver: streams every train shard through the ChunkTransform into one .vortex file per shard, idempotent, sequential, sums wall-time / byte counters The transform always produces f32 vectors so all downstream code (scan, recall, handrolled baseline) drops the f32/f64 dual-pathing the previous benchmark carried. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

query_scalar wrap a query &[f32] as a Scalar::extension::<Vector> suitable for use as a lit() RHS similarity_filter gt(cosine_similarity(col("emb"), lit(query)), lit(threshold)) emb_projection col("emb"), used by the throughput-only scan path Also adds an end-to-end smoke test under tests/end_to_end_smoke.rs that writes a synthetic Struct { emb: Vector<f32, dim> } to a real .vortex file under both flavors and runs the filter expression through file.scan(). The self-matching row (cosine = 1.0) must survive any reasonable threshold — this is the first proof the write strategy and the filter pipeline agree. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Pieces added: src/scan_util.rs median() helper shared between scan and handrolled src/scan.rs per-iteration vortex file scan driver — re-opens every shard fresh per iteration, drains the stream, tracks best-of / median across runs src/query.rs sampler that pulls one query vector from test.parquet (seeded random row, f64 → f32 cast when needed) src/handrolled.rs sequential parquet scan baseline + 4-way unrolled cosine loop; takes query as parameter, f32-only src/handrolled_decode.rs parquet → flat Vec<f32> decoder (List, LargeList, FixedSizeList — Float32 + Float64 narrowing) src/display.rs local column-per-flavor renderer (compress wall, input/output bytes, ratio, scan best/median, matches, throughput) src/main.rs clap CLI: --dataset (single), --layout (validated), --flavors (comma list incl. handrolled), iterations, threshold, query-seed; orchestrates download → prepare → query → scan → render Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

src/recall.rs per-flavor recall driver: samples N test rows, runs brute-force top-K cosine over every shard via a bounded BinaryHeap, compares against the neighbors.parquet ground truth, reports mean + p05 recall src/main.rs --recall / --recall-k / --recall-queries / --recall-seed flags; bails when the dataset has no neighbors hosted; skips lossless flavors (trivially 1.0) src/display.rs extra recall@K (mean) and (p05) rows, only emitted when --recall produced results tests/recall_smoke.rs 8-row standard-basis dataset where train row i is basis e_i and neighbors_id[i] = i. Lossless flavor must hit recall@1 = 1.0. README is fully rewritten to reflect the new on-disk file-scan benchmark, the layout / partitioned model, the f32-only pipeline, and the future-work backlog. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

## Summary Tracking issue: #7297 We will want to add vector benchmarking soon (see #7399 for a draft). This adds a simple catalog for the vector datasets hosted by `https://assets.zilliz.com/benchmark` for [VectorDBBench](https://github.com/zilliztech/vectordbbench), which both describes the shape of the datasets (are things partitioned, randomly shuffled, are there neighbors lists for top k, etc). Also handles downloading everything. I had to verify that all of this stuff was correct by looking at the S3 buckets themselves: ```sh aws s3 ls s3://assets.zilliz.com/benchmark/ --region us-west-2 --no-sign-request ``` <details> ```sh for d in bioasq_large_10m bioasq_medium_1m cohere_large_10m cohere_medium_1m \ cohere_small_100k gist_medium_1m gist_small_100k glove_medium_1m \ glove_small_100k laion_large_100m \ openai_large_5m openai_medium_500k openai_small_50k \ sift_large_50m sift_medium_5m sift_small_500k; do echo "=== $d ===" aws s3 ls s3://assets.zilliz.com/benchmark/$d/ --region us-west-2 --no-sign-request done ``` </details> And this script from the main repo helped too: https://github.com/zilliztech/VectorDBBench/blob/main/vectordb_bench/backend/dataset.py --- Things that are not implemented that I would like to add: - Is the dataset pre-normalized for cosine similarity? This is not so obvious to me without actually working with the datasets, so I will do this later. - Some datasets have scalar labels for all vectors that help mimic similarity + filter by some other column. Some of them also have neighbor lists for these specific filtered queries. So that is something we'll probably want to add in the future. ## Testing N/A Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 added the changelog/feature A new feature label Apr 11, 2026

This was referenced Apr 11, 2026

Add Vector benchmarks to suite #7397

Closed

Tracking Issue: Vector Similarity Search #7297

Open

connortsui20 force-pushed the ct/vector-bench branch 5 times, most recently from 4362b28 to c37689c Compare April 13, 2026 15:06

connortsui20 mentioned this pull request Apr 14, 2026

Optimize inner product, update boilerplate, remove bad benchmarks #7428

Merged

connortsui20 force-pushed the ct/vector-bench branch 3 times, most recently from 851ec23 to c06ba43 Compare April 14, 2026 17:28

connortsui20 marked this pull request as ready for review April 14, 2026 17:30

connortsui20 force-pushed the ct/vector-bench branch from c06ba43 to 46f5491 Compare April 14, 2026 17:33

connortsui20 enabled auto-merge (squash) April 14, 2026 17:33

connortsui20 disabled auto-merge April 14, 2026 17:33

connortsui20 force-pushed the ct/vector-bench branch from 59ee53c to 47adf0e Compare April 14, 2026 20:33

connortsui20 marked this pull request as draft April 14, 2026 21:14

connortsui20 force-pushed the ct/vector-bench branch from 47adf0e to 775fa65 Compare April 15, 2026 01:51

connortsui20 and others added 9 commits April 15, 2026 09:24

vector search benchmarks

f4690c7

Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

fix bioasq

407dc80

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

create example for turboquant serialization

97e2e2f

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/vector-bench branch from f37e388 to b6ea352 Compare April 15, 2026 13:25

connortsui20 mentioned this pull request Apr 15, 2026

Vector datasets catalog and downloader #7446

Merged

connortsui20 closed this Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vector search benchmarks to benchmarking suite#7399

Add vector search benchmarks to benchmarking suite#7399
connortsui20 wants to merge 10 commits intodevelopfrom
ct/vector-bench

connortsui20 commented Apr 11, 2026

Uh oh!

codspeed-hq bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

connortsui20 commented Apr 11, 2026

Summary

Testing

Uh oh!

codspeed-hq bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 14.69%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Apr 15, 2026 •

edited

Loading