Skip to content

Conversation

@Kontinuation
Copy link
Member

This patch adds a index provider for coordinating the creation of spatial index for specified partitions. It is also integrated into SpatialJoinExec so we use it to create the spatial index even when there's only one spatial partition (the degenerate case). The handling for multiple spatial partitions will be added in a subsequent PR.

The memory reservations growed in the build side collection phase will be held by PartitionedIndexProvider. Spatial indexes created by the provider does not need to hold memory reservations.

The next step is to support partitioned probe side by adding a PartitionedProbeStreamProvider, and modify the state machine of SpatialJoinStream to process multiple spatial partitions sequentially.

@Kontinuation Kontinuation requested a review from Copilot January 27, 2026 09:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a PartitionedIndexProvider to coordinate the creation and caching of spatial indexes for specified partitions. The provider is integrated into SpatialJoinExec and SpatialJoinStream, replacing the previous direct spatial index building approach. Memory reservations from the build side collection phase are now held by the provider rather than individual indexes. This is a preparatory step for supporting multi-partitioned spatial joins.

Changes:

  • Introduced PartitionedIndexProvider to manage index creation and caching across partitions
  • Refactored SpatialJoinStream to use the provider for index access
  • Moved memory reservation ownership from spatial indexes to the provider

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/utils/disposable_async_cell.rs New utility for async cell that can be disposed to avoid unnecessary memory usage
rust/sedona-spatial-join/src/utils/bbox_sampler.rs Removed #![allow(unused)] attribute
rust/sedona-spatial-join/src/utils.rs Added disposable_async_cell module
rust/sedona-spatial-join/src/stream.rs Updated to use PartitionedIndexProvider for index creation
rust/sedona-spatial-join/src/prepare.rs New module for preparing spatial join components including the provider
rust/sedona-spatial-join/src/lib.rs Replaced build_index module with prepare module
rust/sedona-spatial-join/src/index/spatial_index_builder.rs Removed memory reservation tracking from builder
rust/sedona-spatial-join/src/index/spatial_index.rs Removed memory reservation field from SpatialIndex
rust/sedona-spatial-join/src/index/partitioned_index_provider.rs New provider for managing partitioned spatial indexes
rust/sedona-spatial-join/src/index/memory_plan.rs New module for computing memory usage plans
rust/sedona-spatial-join/src/index/build_side_collector.rs Added accessor method for spill metrics
rust/sedona-spatial-join/src/index.rs Added new modules to index module
rust/sedona-spatial-join/src/exec.rs Updated to create and use PartitionedIndexProvider
rust/sedona-spatial-join/src/build_index.rs Removed memory pool parameter from index builder
rust/sedona-spatial-join/Cargo.toml Added tokio dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation marked this pull request as ready for review January 27, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant