Skip to content

Refactor sort functions and fix integer overflow#82

Merged
keichi merged 10 commits intomasterfrom
cub-sort
Jan 14, 2026
Merged

Refactor sort functions and fix integer overflow#82
keichi merged 10 commits intomasterfrom
cub-sort

Conversation

@keichi
Copy link
Owner

@keichi keichi commented Jan 14, 2026

Summary

  • Rename full_sort_radix to full_sort_cub for clarity
  • Rename CPU sort functions to use _stl suffix (full_sort_cpufull_sort_stl, partial_sort_cpupartial_sort_stl)
  • Add partial_sort wrapper function that dispatches to partial_sort_kokkos on CUDA or partial_sort_stl on CPU, consistent with full_sort pattern
  • Use STL sort unconditionally on CPU for full_sort
  • Fix integer overflow in loop indices by changing int to size_t
  • Simplify LIKWID marker names in partial_sort_bench
  • Switch from CUB DeviceSegmentedRadixSort to DeviceSegmentedSort and fix issue where distance matrix with more than INT_MAX elements would fail

Test plan

  • All 23 tests pass
  • Build succeeds on CPU (OpenMP backend)

keichi added 10 commits January 12, 2026 22:37
- Rename full_sort_radix to full_sort_cub using cub::DeviceSegmentedSort
- Use size_t for segment offsets and num_items to prevent integer overflow
  when n_pred * n_lib exceeds INT_MAX (~2.1 billion)
- This fixes cudaErrorIllegalAddress on datasets with N >= 50000
- Rename full_sort_cpu to full_sort_stl
- Rename partial_sort_cpu to partial_sort_stl
- Update CLI options in partial_sort_bench:
  - -s/--stl-sort for STL sort
  - -c/--cub-sort for CUB sort
  - -S/--scratch-sort for scratch memory sort
- Rename partial_sort to partial_sort_kokkos for the GPU implementation
- Add partial_sort wrapper that dispatches to partial_sort_kokkos on CUDA
  or partial_sort_stl on CPU, consistent with full_sort pattern
- Update partial_sort_bench to use renamed function
Consolidate marker names to just 'full_sort' and 'partial_sort'
instead of separate markers for each implementation variant.
Use DefaultHostExecutionSpace for the parallel_for in partial_sort_stl
to avoid CUDA compilation errors when building with GPU support.
@keichi keichi merged commit af3f7c5 into master Jan 14, 2026
11 checks passed
@keichi keichi deleted the cub-sort branch February 25, 2026 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant