You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Rust implementation of DiskANN (Disk-based Approximate Nearest Neighbor search) using the Vamana graph algorithm. This project provides an efficient and scalable solution for large-scale vector similarity search with minimal memory footprint.
F16 (2x) and Int8 (4x) compression with SIMD-accelerated distance
Memory-Mapped I/O
Single-file storage with minimal RAM footprint
Byte Serialization
Load indexes from bytes (network, embedded, no filesystem)
Benchmark Formats
Read/write fvecs, ivecs, bvecs (standard ANN benchmark formats)
Parallel Processing
Concurrent index building and batch queries
Quick Start
Basic Index Operations
use anndists::dist::DistL2;use diskann_rs::{DiskANN,DiskAnnParams};// Build indexlet vectors:Vec<Vec<f32>> = vec![vec![0.1,0.2,0.3], vec![0.4,0.5,0.6]];let index = DiskANN::<DistL2>::build_index_default(&vectors,DistL2{},"index.db")?;// Searchlet query = vec![0.1,0.2,0.4];let neighbors:Vec<u32> = index.search(&query,10,256);
Incremental Updates (No Rebuild Required)
use anndists::dist::DistL2;use diskann_rs::IncrementalDiskANN;// Build initial indexlet vectors = vec![vec![0.0;128];1000];let index = IncrementalDiskANN::<DistL2>::build_default(&vectors,"index.db")?;// Add new vectors without rebuildinglet new_vectors = vec![vec![1.0;128];100];
index.add_vectors(&new_vectors)?;// Delete vectors (instant tombstoning)
index.delete_vectors(&[0,1,2])?;// Compact when needed (merges delta layer)if index.should_compact(){
index.compact("index_v2.db")?;}
Filtered Search (Metadata Predicates)
use anndists::dist::DistL2;use diskann_rs::{FilteredDiskANN,Filter};// Build with labels (e.g., category IDs)let vectors = vec![vec![0.0;128];1000];let labels:Vec<Vec<u64>> = (0..1000).map(|i| vec![i % 10]).collect();// 10 categorieslet index = FilteredDiskANN::<DistL2>::build(&vectors,&labels,"filtered.db")?;// Search only category 5let filter = Filter::label_eq(0,5);let results = index.search_filtered(&query,10,128,&filter);// Complex filterslet filter = Filter::and(vec![Filter::label_eq(0,5),// category == 5Filter::label_range(1,10,100),// price in [10, 100]]);
Composable Incremental Index (Filtered + Quantized + Incremental)
use anndists::dist::DistL2;use diskann_rs::{IncrementalDiskANN,IncrementalQuantizedConfig,QuantizerKind,Filter};// Build an incremental index with labels and F16 quantizationlet vectors = vec![vec![0.0;128];1000];let labels:Vec<Vec<u64>> = (0..1000).map(|i| vec![i % 5]).collect();let quant_config = IncrementalQuantizedConfig{rerank_size:50};let index = IncrementalDiskANN::<DistL2>::build_full(&vectors,&labels,"composable.db",Default::default(),// IncrementalConfigQuantizerKind::F16,
quant_config,)?;// Filtered search on the incremental indexlet filter = Filter::label_eq(0,3);let results = index.search_filtered(&query,10,128,&filter);// Add labeled vectors without rebuildinglet new_vecs = vec![vec![1.0;128];50];let new_labels = vec![vec![2u64];50];
index.add_vectors_with_labels(&new_vecs,&new_labels)?;// Delete, compact, serialize — all features compose
index.delete_vectors(&[0,1,2])?;let bytes = index.to_bytes();
Product Quantization (64x Compression)
use diskann_rs::pq::{ProductQuantizer,PQConfig};// Train quantizerlet config = PQConfig{num_subspaces:8,// M = 8 segmentsnum_centroids:256,// K = 256 codes per segment
..Default::default()};let pq = ProductQuantizer::train(&vectors, config)?;// Encode vectors (128-dim f32 -> 8 bytes)let codes:Vec<Vec<u8>> = pq.encode_batch(&vectors);// Fast approximate distance using lookup tablelet table = pq.create_distance_table(&query);let dist = pq.distance_with_table(&table,&codes[0]);
SIMD-Accelerated Distance
use diskann_rs::{SimdL2,DiskANN, simd_info};// Check available SIMD featuresprintln!("{}", simd_info());// "SIMD: NEON" or "SIMD: AVX2, SSE4.1"// Use SIMD-optimized L2 distancelet index = DiskANN::<SimdL2>::build_index_default(&vectors,SimdL2,"index.db")?;// Or use SIMD directlyuse diskann_rs::simd::{l2_squared, dot_product, cosine_distance};let dist = l2_squared(&vec_a,&vec_b);
use anndists::dist::DistL2;use diskann_rs::DiskANN;use std::sync::Arc;// Build and serialize to byteslet index = DiskANN::<DistL2>::build_index_default(&vectors,DistL2{},"index.db")?;let bytes:Vec<u8> = index.to_bytes();// Load from owned bytes (e.g., downloaded from network)let index = DiskANN::<DistL2>::from_bytes(bytes,DistL2{})?;// Load from shared bytes (multi-reader, zero-copy)let shared:Arc<[u8]> = load_from_somewhere().into();let index = DiskANN::<DistL2>::from_shared_bytes(shared,DistL2{})?;// Works for all index typeslet filtered_bytes = filtered_index.to_bytes();let incremental_bytes = incremental_index.to_bytes();
Benchmark Format Support (fvecs/ivecs/bvecs)
use diskann_rs::formats::{read_fvecs, write_fvecs, read_ivecs, read_bvecs_as_f32};// Load standard ANN benchmark datasets (SIFT, GIST, GloVe, etc.)let base_vectors = read_fvecs("sift_base.fvecs")?;// Vec<Vec<f32>>let ground_truth = read_ivecs("sift_groundtruth.ivecs")?;// Vec<Vec<i32>>let queries = read_fvecs("sift_query.fvecs")?;// Load byte vectors as normalized floatslet mnist = read_bvecs_as_f32("mnist.bvecs")?;// u8 [0,255] -> f32 [0,1]// Save your own vectorswrite_fvecs("my_vectors.fvecs",&vectors)?;
Performance
Why diskann-rs? Memory-Mapped I/O
Unlike in-memory indexes that require loading the entire graph into RAM, diskann-rs uses memory-mapped files. The OS loads only the pages you access, making it ideal for large-scale deployments:
Workload
diskann-rs
hnsw_rs
Savings
Light (10 queries)
90 MB
896 MB
10x less RAM
Medium (100 queries)
136 MB
896 MB
6.6x less RAM
Heavy (1K queries)
147 MB
896 MB
6x less RAM
Stress (5K queries)
139 MB
896 MB
6.4x less RAM
Tested with 200K vectors, 128 dimensions. hnsw_rs must hold the full index in RAM; diskann-rs loads pages on-demand.