Skip to content

Indexing

Anup Ghatage edited this page Feb 12, 2026 · 1 revision

Indexing

Zeppelin supports multiple vector index types, each offering different tradeoffs between accuracy, memory, and speed.

Index Types

Type Config Value Compression Description
IVF-Flat ivf_flat 1x (none) Full-precision IVF with k-means partitioning
IVF-SQ8 ivf_sq 4x IVF + scalar quantization (8-bit codes)
IVF-PQ ivf_pq 16-32x IVF + product quantization
Hierarchical hierarchical varies Multi-level centroid tree with beam search

IVF-Flat

The default index type. Vectors are partitioned into clusters using k-means, and each cluster's vectors are stored in a separate S3 object.

How it works

  1. Training: Run k-means++ on all vectors to find N centroids (default: 256)
  2. Assignment: Each vector is assigned to its nearest centroid
  3. Storage: Each cluster is serialized as a binary blob with vector IDs, values, and attribute data
  4. Search: Find the nprobe nearest centroids to the query, then scan only those clusters

Build parameters

Parameter Default Description
default_num_centroids 256 Number of k-means clusters
kmeans_max_iterations 25 Max k-means iterations
kmeans_convergence_epsilon 1e-4 Convergence threshold

Search parameters

Parameter Default Description
default_nprobe 16 Clusters to probe per query
max_nprobe 128 Maximum allowed nprobe

Distance metrics

Metric Value Formula
Cosine cosine 1 - cos(a, b) (0 = identical)
Euclidean euclidean ‖a - b‖² (0 = identical)
Dot Product dot_product -a·b (more negative = more similar)

Scalar Quantization (SQ8)

SQ8 compresses each f32 dimension to a uint8, achieving 4x compression with minimal accuracy loss.

How it works

  1. Calibration: Compute per-dimension min/max across all vectors
  2. Encoding: Map each f32 value to [0, 255]: code = round((value - min) / (max - min) × 255)
  3. Decoding: Reverse: value = min + code × (max - min) / 255
  4. Search: Asymmetric distance computation — query stays in f32, codes in uint8

Artifacts on S3

segments/<segment_id>/sq_calibration.bin    # [dim: u32][min_0: f32][max_0: f32]...
segments/<segment_id>/sq_cluster_<N>.bin    # [n: u32][dim: u32] then per vector: [id_len: u32][id][codes: u8×dim]

Two-phase search

  1. Coarse phase: Compute asymmetric distances using SQ8 codes, collect top k × rerank_factor candidates
  2. Rerank phase: Load full-precision vectors for candidates, compute exact distances, return top-k

Product Quantization (PQ)

PQ achieves 16-32x compression by splitting vectors into subspaces and encoding each subspace with a codebook.

How it works

  1. Training: Split vectors into M subspaces of dimension D/M. Run k-means on each subspace to learn 256 centroids (codebook)
  2. Encoding: Each vector becomes M bytes — one codebook index per subspace
  3. ADC (Asymmetric Distance Computation): Precompute a lookup table [M][256] of distances from the query to each codebook centroid. Distance to any encoded vector is O(M) table lookups

Parameters

Parameter Default Description
pq_m 8 Number of subspaces
rerank_factor 4 Candidates = top_k × rerank_factor

PQ requires dimension % pq_m == 0.

Artifacts on S3

segments/<segment_id>/pq_codebook.bin       # [M: u32][K: u32][sub_dim: u32][centroids: f32×M×K×sub_dim]
segments/<segment_id>/pq_cluster_<N>.bin    # [n: u32][M: u32] then per vector: [id_len: u32][id][codes: u8×M]

Hierarchical ANN

A multi-level centroid tree that enables sub-linear search for large datasets.

How it works

  1. Build: Recursively apply k-means. The root level partitions all vectors. Each non-leaf cluster is further partitioned until clusters are small enough (leaf_size)
  2. Search: Beam search from root to leaves. At each level, evaluate the beam_width most promising centroids and descend into their children
  3. Mixed-depth trees: Not all branches have the same depth. Some clusters may be leaves at level 1 while others descend to level 3+

Parameters

Parameter Default Description
hierarchical false Enable hierarchical index
beam_width 10 Candidates maintained at each level
leaf_size Target vectors per leaf cluster

Artifacts on S3

segments/<segment_id>/tree.json             # Tree structure (node → children mapping)
segments/<segment_id>/centroids.bin         # All centroids across all levels
segments/<segment_id>/cluster_<N>.bin       # Leaf cluster data

f16 Storage

When enabled, cluster data is additionally stored in half-precision (f16) format, achieving 50% compression over f32.

segments/<segment_id>/f16_cluster_<N>.bin   # Same layout as cluster, but f16 values

Bitmap Pre-Filter Indexes

RoaringBitmap indexes built per attribute field during compaction. Used to efficiently skip vectors that don't match filter criteria before distance computation.

Format

Each bitmap file uses the ZBMP magic bytes:

segments/<segment_id>/bitmap_<field>.bin

Contains:

  • Present bitmap: Which vectors have this field at all
  • Value bitmaps: Per distinct value, which vectors have that value

How search uses bitmaps

  1. Evaluate the filter expression against bitmap indexes
  2. Produce a candidate RoaringBitmap of matching vector positions
  3. During cluster scanning, skip any vector not in the candidate set
  4. Falls back to post-filtering for filters that can't use bitmaps

Full-Text Inverted Indexes

Built during compaction for FTS-enabled fields. See Full-Text Search for details.

segments/<segment_id>/fts_<field>.bin       # ZFTS magic, posting lists per term

Contains:

  • Corpus stats (doc count, average doc length)
  • Per-term posting lists (doc positions + term frequencies)

Clone this wiki locally