Skip to content

feat(index): batch invalidation writes for improved performance#626

Draft
hicder wants to merge 1 commit intomasterfrom
hicder/flush-invalidation
Draft

feat(index): batch invalidation writes for improved performance#626
hicder wants to merge 1 commit intomasterfrom
hicder/flush-invalidation

Conversation

@hicder
Copy link
Owner

@hicder hicder commented Feb 9, 2026

Change invalidation persistence strategy from immediate disk writes to in-memory buffering with periodic flushing. This reduces I/O overhead when processing document deletions.

Key changes:

MultiSpannIndex:

  • invalidate() and invalidate_batch() now only update in-memory state
  • Add flush_invalidations() to persist buffered invalidations to disk
  • Track pending invalidations in DashMap<u128, HashSet<u128>>

MutableSegment:

  • Buffer invalidations in DashMap during invalidate() calls
  • Write all pending invalidations to disk during build() via InvalidatedIdsStorage

ImmutableSegment:

  • Add pending_invalidations: RwLock<HashMap<u128, HashSet<u128>>>
  • Add flush_invalidations() method called during segment operations
  • remove() now buffers instead of immediately persisting

Collection:

  • Call flush_invalidations() on finalized segments during flush

io_uring improvements (uring_engine.rs, uring_file.rs):

  • Fix buffer pinning: use Pin<Vec<u8>> instead of Pin<Box<Vec<u8>>>
  • Replace std::sync::Mutex with tokio::sync::Mutex for async safety
  • Add explicit Send/Sync impls for UringFile
  • Fix Clippy warning: use for loop instead of while let

This design batches disk writes for deletions, reducing I/O overhead while maintaining consistency by flushing during segment build/flush operations.

Change invalidation persistence strategy from immediate disk writes to
in-memory buffering with periodic flushing. This reduces I/O overhead
when processing document deletions.

Key changes:

MultiSpannIndex:
- `invalidate()` and `invalidate_batch()` now only update in-memory state
- Add `flush_invalidations()` to persist buffered invalidations to disk
- Track pending invalidations in `DashMap<u128, HashSet<u128>>`

MutableSegment:
- Buffer invalidations in `DashMap` during `invalidate()` calls
- Write all pending invalidations to disk during `build()` via
  `InvalidatedIdsStorage`

ImmutableSegment:
- Add `pending_invalidations: RwLock<HashMap<u128, HashSet<u128>>>`
- Add `flush_invalidations()` method called during segment operations
- `remove()` now buffers instead of immediately persisting

Collection:
- Call `flush_invalidations()` on finalized segments during flush

io_uring improvements (uring_engine.rs, uring_file.rs):
- Fix buffer pinning: use `Pin<Vec<u8>>` instead of `Pin<Box<Vec<u8>>>`
- Replace `std::sync::Mutex` with `tokio::sync::Mutex` for async safety
- Add explicit Send/Sync impls for UringFile
- Fix Clippy warning: use `for` loop instead of `while let`

This design batches disk writes for deletions, reducing I/O overhead
while maintaining consistency by flushing during segment build/flush
operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant