Indexing

Zeppelin supports multiple vector index types, each offering different tradeoffs between accuracy, memory, and speed.

Index Types

Type	Config Value	Compression	Description
IVF-Flat	`ivf_flat`	1x (none)	Full-precision IVF with k-means partitioning
IVF-SQ8	`ivf_sq`	4x	IVF + scalar quantization (8-bit codes)
IVF-PQ	`ivf_pq`	16-32x	IVF + product quantization
Hierarchical	`hierarchical`	varies	Multi-level centroid tree with beam search

IVF-Flat

The default index type. Vectors are partitioned into clusters using k-means, and each cluster's vectors are stored in a separate S3 object.

How it works

Training: Run k-means++ on all vectors to find N centroids (default: 256)
Assignment: Each vector is assigned to its nearest centroid
Storage: Each cluster is serialized as a binary blob with vector IDs, values, and attribute data
Search: Find the nprobe nearest centroids to the query, then scan only those clusters

Build parameters

Parameter	Default	Description
`default_num_centroids`	256	Number of k-means clusters
`kmeans_max_iterations`	25	Max k-means iterations
`kmeans_convergence_epsilon`	1e-4	Convergence threshold

Search parameters

Parameter	Default	Description
`default_nprobe`	16	Clusters to probe per query
`max_nprobe`	128	Maximum allowed nprobe

Distance metrics

Metric	Value	Formula
Cosine	`cosine`	`1 - cos(a, b)` (0 = identical)
Euclidean	`euclidean`	`‖a - b‖²` (0 = identical)
Dot Product	`dot_product`	`-a·b` (more negative = more similar)

Scalar Quantization (SQ8)

SQ8 compresses each f32 dimension to a uint8, achieving 4x compression with minimal accuracy loss.

How it works

Calibration: Compute per-dimension min/max across all vectors
Encoding: Map each f32 value to [0, 255]: code = round((value - min) / (max - min) × 255)
Decoding: Reverse: value = min + code × (max - min) / 255
Search: Asymmetric distance computation — query stays in f32, codes in uint8

Artifacts on S3

segments/<segment_id>/sq_calibration.bin    # [dim: u32][min_0: f32][max_0: f32]...
segments/<segment_id>/sq_cluster_<N>.bin    # [n: u32][dim: u32] then per vector: [id_len: u32][id][codes: u8×dim]

Two-phase search

Coarse phase: Compute asymmetric distances using SQ8 codes, collect top k × rerank_factor candidates
Rerank phase: Load full-precision vectors for candidates, compute exact distances, return top-k

Product Quantization (PQ)

PQ achieves 16-32x compression by splitting vectors into subspaces and encoding each subspace with a codebook.

How it works

Training: Split vectors into M subspaces of dimension D/M. Run k-means on each subspace to learn 256 centroids (codebook)
Encoding: Each vector becomes M bytes — one codebook index per subspace
ADC (Asymmetric Distance Computation): Precompute a lookup table [M][256] of distances from the query to each codebook centroid. Distance to any encoded vector is O(M) table lookups

Parameters

Parameter	Default	Description
`pq_m`	8	Number of subspaces
`rerank_factor`	4	Candidates = top_k × rerank_factor

PQ requires dimension % pq_m == 0.

Artifacts on S3

segments/<segment_id>/pq_codebook.bin       # [M: u32][K: u32][sub_dim: u32][centroids: f32×M×K×sub_dim]
segments/<segment_id>/pq_cluster_<N>.bin    # [n: u32][M: u32] then per vector: [id_len: u32][id][codes: u8×M]

Hierarchical ANN

A multi-level centroid tree that enables sub-linear search for large datasets.

How it works

Build: Recursively apply k-means. The root level partitions all vectors. Each non-leaf cluster is further partitioned until clusters are small enough (leaf_size)
Search: Beam search from root to leaves. At each level, evaluate the beam_width most promising centroids and descend into their children
Mixed-depth trees: Not all branches have the same depth. Some clusters may be leaves at level 1 while others descend to level 3+

Parameters

Parameter	Default	Description
`hierarchical`	false	Enable hierarchical index
`beam_width`	10	Candidates maintained at each level
`leaf_size`	—	Target vectors per leaf cluster

Artifacts on S3

segments/<segment_id>/tree.json             # Tree structure (node → children mapping)
segments/<segment_id>/centroids.bin         # All centroids across all levels
segments/<segment_id>/cluster_<N>.bin       # Leaf cluster data

f16 Storage

When enabled, cluster data is additionally stored in half-precision (f16) format, achieving 50% compression over f32.

segments/<segment_id>/f16_cluster_<N>.bin   # Same layout as cluster, but f16 values

Bitmap Pre-Filter Indexes

RoaringBitmap indexes built per attribute field during compaction. Used to efficiently skip vectors that don't match filter criteria before distance computation.

Format

Each bitmap file uses the ZBMP magic bytes:

segments/<segment_id>/bitmap_<field>.bin

Contains:

Present bitmap: Which vectors have this field at all
Value bitmaps: Per distinct value, which vectors have that value

How search uses bitmaps

Evaluate the filter expression against bitmap indexes
Produce a candidate RoaringBitmap of matching vector positions
During cluster scanning, skip any vector not in the candidate set
Falls back to post-filtering for filters that can't use bitmaps

Full-Text Inverted Indexes

Built during compaction for FTS-enabled fields. See Full-Text Search for details.

segments/<segment_id>/fts_<field>.bin       # ZFTS magic, posting lists per term

Contains:

Corpus stats (doc count, average doc length)
Per-term posting lists (doc positions + term frequencies)

Getting Started

API & SDKs

Configuration

Architecture

Operations

Indexing

Indexing

Index Types

IVF-Flat

How it works

Build parameters

Search parameters

Distance metrics

Scalar Quantization (SQ8)

How it works

Artifacts on S3

Two-phase search

Product Quantization (PQ)

How it works

Parameters

Artifacts on S3

Hierarchical ANN

How it works

Parameters

Artifacts on S3

f16 Storage

Bitmap Pre-Filter Indexes

Format

How search uses bitmaps

Full-Text Inverted Indexes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally