Skip to content

MB-62182: Avoid re-training vector indexes during merge#2204

Open
Thejas-bhat wants to merge 25 commits intomasterfrom
fastmerge
Open

MB-62182: Avoid re-training vector indexes during merge#2204
Thejas-bhat wants to merge 25 commits intomasterfrom
fastmerge

Conversation

@Thejas-bhat
Copy link
Member

@Thejas-bhat Thejas-bhat commented Jun 17, 2025

  • The main purpose of this PR is to avoid unnecessary re-training of the vector indexes during merge process.
  • Going by the numbers, we need roughly 156K vectors for a 1M dataset ((min_num_vectors_per_centroid) * num_centroids = 39 * 4 * sqrt(1M)) as per recommendation
  • The data ingestion is now split into 2 phases - the first phase involves creating a centroid index using the Train() API and the bolt is recorded with the progress in terms of samples trained upon. The second phase is just the normal indexing of data using the Batch() or the Index() APIs.
  • Later on, when the vector indexes are getting merged the merger will use the centroid index to merge the inverted lists (centroids) in a block-wise fashion without reconstructing the layout.

@abhinavdangeti abhinavdangeti added this to the v2.6.0 milestone Jul 21, 2025
@Thejas-bhat Thejas-bhat changed the title WIP fast merge [WIP] MB-62182: Avoid re-training vector indexes during merge Jan 15, 2026
@Thejas-bhat Thejas-bhat force-pushed the fastmerge branch 2 times, most recently from c7a94a6 to 59a66df Compare January 29, 2026 19:16
@Thejas-bhat Thejas-bhat marked this pull request as ready for review January 29, 2026 19:34
@Thejas-bhat Thejas-bhat changed the title [WIP] MB-62182: Avoid re-training vector indexes during merge MB-62182: Avoid re-training vector indexes during merge Jan 29, 2026
@Thejas-bhat Thejas-bhat moved this from Todo to In Progress in Fast Merge Jan 30, 2026
@Thejas-bhat Thejas-bhat force-pushed the fastmerge branch 2 times, most recently from 16fb63f to 74072d3 Compare February 5, 2026 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants