Conversation
- Rename full_sort_radix to full_sort_cub using cub::DeviceSegmentedSort - Use size_t for segment offsets and num_items to prevent integer overflow when n_pred * n_lib exceeds INT_MAX (~2.1 billion) - This fixes cudaErrorIllegalAddress on datasets with N >= 50000
- Rename full_sort_cpu to full_sort_stl - Rename partial_sort_cpu to partial_sort_stl - Update CLI options in partial_sort_bench: - -s/--stl-sort for STL sort - -c/--cub-sort for CUB sort - -S/--scratch-sort for scratch memory sort
- Rename partial_sort to partial_sort_kokkos for the GPU implementation - Add partial_sort wrapper that dispatches to partial_sort_kokkos on CUDA or partial_sort_stl on CPU, consistent with full_sort pattern - Update partial_sort_bench to use renamed function
Consolidate marker names to just 'full_sort' and 'partial_sort' instead of separate markers for each implementation variant.
Use DefaultHostExecutionSpace for the parallel_for in partial_sort_stl to avoid CUDA compilation errors when building with GPU support.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
full_sort_radixtofull_sort_cubfor clarity_stlsuffix (full_sort_cpu→full_sort_stl,partial_sort_cpu→partial_sort_stl)partial_sortwrapper function that dispatches topartial_sort_kokkoson CUDA orpartial_sort_stlon CPU, consistent withfull_sortpatternfull_sortinttosize_tDeviceSegmentedRadixSorttoDeviceSegmentedSortand fix issue where distance matrix with more than INT_MAX elements would failTest plan