Fea/6482 enable grid constant for all non-mutable CUB kernel parameters#6642
Fea/6482 enable grid constant for all non-mutable CUB kernel parameters#6642toxicteddy00077 wants to merge 29 commits intoNVIDIA:mainfrom
Conversation
…ea/6482-enable-grid-constant Merge upstream
f690f5e to
bf617f5
Compare
|
/ok to test bf617f5 |
|
Thank you for this contribution! I think we should also do a benchmark and a SASS check somewhere, just because I am curious about the impact of this change. @toxicteddy00077 you can leave this to us. |
This comment has been minimized.
This comment has been minimized.
fbusato
left a comment
There was a problem hiding this comment.
thanks @toxicteddy00077. It is great to see this feature in CUB.
Initial feedbacl: This work could be extended to more parameters, e.g. cuda::std::array and pointers. Also, we need to double-check the implications for custom operators
| CUB_DETAIL_KERNEL_ATTRIBUTES void DeviceHistogramSweepKernel( | ||
| SampleIteratorT d_samples, | ||
| _CCCL_GRID_CONSTANT const SampleIteratorT d_samples, | ||
| ::cuda::std::array<int, NumActiveChannels> num_output_bins_wrapper, |
There was a problem hiding this comment.
why _CCCL_GRID_CONSTANT is skipped on cuda::std::array parameters?
There was a problem hiding this comment.
I tried it earlier, but when i tested the benches I ran into errors regarding AgentHistogram since the .data() function in it is expected to return type CounterT** but with _CCCL_GRID_CONSTANT const seems to return something else. Please let me know if I've missed something
There was a problem hiding this comment.
I didn't check the code. It is a bit unexpected that we modify internal cuda::std::array pointers. Maybe @bernhardmgruber knows more about this point.
Excluding pointers, other ::cuda::std::array parameters should work.
There was a problem hiding this comment.
I assume this had historical reasons, because we could not change AgentHistogram to take pointers by const before CCCL 3.0, where we moved them to a detail namespace.
Please add _CCCL_GRID_CONSTANT const to the ::cuda::std::array for histogram and propagatge const throughout the agent.
There was a problem hiding this comment.
I have added _CCCL_GRID_CONSTANT const to the ::cuda::std::array, just had to make the necessary AgentHistogram members const(I marked them as read-only) and also make some kernel_histogram.cuh methods in Transform which are utilized as const
| ValueT* items_pong, | ||
| CompareOpT compare_op, | ||
| _CCCL_GRID_CONSTANT const CompareOpT compare_op, | ||
| OffsetT* merge_partitions, |
There was a problem hiding this comment.
is it a write memory location? or _CCCL_GRID_CONSTANT was not applied to pointers
There was a problem hiding this comment.
From what i understand the _CCCL_GRID_CONSTANT const was applied to the outer pointers, not the inner pointers which actually point to the write memory.
There was a problem hiding this comment.
yes, this is the reason I asked. It doesn't matter if the pointer is for read or write
| CUB_DETAIL_KERNEL_ATTRIBUTES void DeviceHistogramSweepKernel( | ||
| SampleIteratorT d_samples, | ||
| _CCCL_GRID_CONSTANT const SampleIteratorT d_samples, | ||
| ::cuda::std::array<int, NumActiveChannels> num_output_bins_wrapper, |
There was a problem hiding this comment.
I didn't check the code. It is a bit unexpected that we modify internal cuda::std::array pointers. Maybe @bernhardmgruber knows more about this point.
Excluding pointers, other ::cuda::std::array parameters should work.
| KeyT* tmp_keys_out, | ||
| ValueT* tmp_items_out, | ||
| CompareOpT compare_op, | ||
| _CCCL_GRID_CONSTANT const CompareOpT compare_op, |
There was a problem hiding this comment.
side effects here are extremely rare, but I would remove _CCCL_GRID_CONSTANT from compare_op
|
Apologies for the delay. I will be making changes and adding |
| @@ -445,17 +445,17 @@ template <typename ChainedPolicyT, | |||
| typename OffsetT> | |||
| __launch_bounds__(int(ChainedPolicyT::ActivePolicy::AgentHistogramPolicyT::BLOCK_THREADS)) | |||
| CUB_DETAIL_KERNEL_ATTRIBUTES void DeviceHistogramSweepKernel( | |||
There was a problem hiding this comment.
the previous kernel DeviceHistogramInitKernel misses _CCCL_GRID_CONSTANT
| AtomicOffsetT* d_ctrs, | ||
| OffsetT* d_bins_out, | ||
| const OffsetT* d_bins_in, | ||
| _CCCL_GRID_CONSTANT const OffsetT* const d_bins_in, |
There was a problem hiding this comment.
DeviceRadixSortExclusiveSumKernel (below) is missing
774eb51 to
0e50ba3
Compare
bfbcdf4 to
53dbe0b
Compare
|
done @bernhardmgruber |
|
/ok to test 53dbe0b |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Some CI jobs are getting stuck compiling now. I have seen this previously with the warpspeed scan, where adding cccl/cub/cub/device/dispatch/kernels/kernel_scan.cuh Lines 187 to 190 in 94bd6e4 @fbusato what do you want to do? Use |
|
thanks, @bernhardmgruber. This is very helpful to address the problem. Yes, I strongly suggest to enable it only for nvcc >= 12.8. I will track the potential bug offline and chat with the compiler team. |
9ffb226 to
e384d92
Compare
|
@fbusato any update on this? thanks |
|
@toxicteddy00077 sorry again, this not a known issue. Please go ahead and enable |
e384d92 to
fb8b5cb
Compare
|
/ok to test fb8b5cb |
This comment has been minimized.
This comment has been minimized.
|
@fbusato there's still one test failing, what could be the reason? Anything i can fix? |
|
this is unrelated to your changes. It has been fixed quite recently. Let me try to rebase the PR. |
|
/ok to test 4a7004e |
😬 CI Workflow Results🟥 Finished in 2h 58m: Pass: 99%/298 | Total: 12d 04h | Max: 2h 57m | Hits: 69%/372635See results here. |
|
Python nvcc GCC / SQ / [CTK13.1 GCC13 py3.13] Test cuda.compute(amd64, L4)
@NaderAlAwar could you please check? |
|
@fbusato Looked into this. The LDL/STL instructions appear even with NVCC, this is not specific to |
|
Disabled sass checks in #8053. Once it lands, you should merge from main to fix CI |
Description
closes #6482
Just added
_CCCL_GRID_CONSTANT constto the immutable params for the necessary kernels