Skip to content

Conversation

@pwrliang
Copy link
Contributor

This PR introduces a major refactoring of the GPU-accelerated library. Compared to the previous design, this version decouples the spatial join into distinct filtering and refinement stages, making it easier to integrate it into rust/sedona-spatial-join in an upcoming PR. Additionally, this update includes performance optimizations and minor structural improvements:

  • ParallelWkbLoader: Enhances parsing performance by balancing workloads across threads based on byte counts.
  • RelateEngine: Improves Point-in-Polygon (PIP) performance by allowing a single AABB to bound multiple line segments (conceptually similar to TG's Natural index).
  • MemoryManager: Enables the memory pool to reduce frequent allocation and deallocation overheads.
  • Header Renaming: Standardizes naming conventions by using .cuh for CUDA-exclusive headers and .hpp for mixed CUDA/C++ code.

@pwrliang pwrliang changed the title feat(c/sedona-sedona-libgpuspatial): Improving GPU Spatial Join Library feat(c/sedona-sedona-libgpuspatial): Refactoring GPU Spatial Join Library Jan 27, 2026
@petern48
Copy link
Contributor

As you wait for a proper review, I'll suggest that you consider breaking this into multiple PRs. It looks like you described 4 separate changes (the bullets) that would tend nicely to at least 4 isolated PRs. I know it's more work for you, but it reduces the review burden significantly, as reviewers don't need to figure out which of the 4 bullets a particular code change applies to.

You can always base branches off of each other to reuse work from your other branches. Something like below, or however you see fit.

Header Renaming -> MemoryManager -> RelateEngine
       \
        \---> ParallelWkbLoader

(this is a random example, i didn't actually look into it that much)

Doing a separate PR for Header Renaming and straightforward changes would cut down the 88-file diff significantly, and could even be reviewed by people with less context about GPU join (like me).

Separate PRs could help your changes land faster, and help with future debugging / understanding when someone tries to figure out what happened. WDYT? Would breaking this up be reasonable?

@paleolimbot
Copy link
Member

As you wait for a proper review, I'll suggest that you consider breaking this into multiple PRs.

@pwrliang kindly split this one out from the larger parent PR at my request...you are absolutely correct that 88 files and 6000 lines is a huge diff and those are excellent suggestions on how to split that up further.

I do plan to attempt reviewing this tomorrow; however, we do need to respect that this development is happening in a public repository with a community and in the future (or now, if Peter or other members of the community are blocked from participating because of this PR's size) we will need to have changes be incremental. We want this code in SedonaDB not just because it is awesome but because we want to be the place where future contributors add CUDS-accellerated spatial functions, and to do that we'll need to work in the open and incrementally.

My idea with scoping this to only include code in c/sedona-libgpuspatial was that it isn't used in the rest of the engine (and won't be by default without very special opt-in build/runtime configurations for some time). Even though this is large, the consequences of missing details are low...this is a very hard thing to do and until the end-to-end is in place it is hard to even know that it worked. Big PRs are definitely not ideal but also sometimes we need to do hard things. GPU spatial joins are a very very hard thing!

@petern48
Copy link
Contributor

(or now, if Peter or other members of the community are blocked from participating because of this PR's size)

To be clear, I'm not dying to participate 😅. Definitely too busy for that. Just figured I'd encourage making it more modular, for both the purposes of reviewing and for cleaner traceability and git history. Though yeah, the fact that this is isolated to c/sedona-libgpuspatial makes those less important.

kindly split this one out from the larger parent PR at my request

I guess my point is that it looks like it could be broken up even more (primarily based on the PR description). But if you're all good with reviewing it as it is, go for it! I don't mean to stand in the way.

@pwrliang
Copy link
Contributor Author

Thanks for your suggestions. For the future PRs, I will make incremental changes to make them easy to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants