[GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1#11275
[GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1#11275zhouyuan merged 15 commits intoapache:mainfrom
Conversation
Signed-off-by: Yuan <yuanzhou@apache.org>
| ${VELOX_BUILD_PATH}/_deps/nvtx3-src/c/include | ||
| ${VELOX_BUILD_PATH}/_deps/nvcomp_proprietary_binary-src/include | ||
| ${VELOX_BUILD_PATH}/_deps/rapids_logger-src/include | ||
| /usr/local/cuda/include/cccl |
There was a problem hiding this comment.
If possible, we should try to fix this by calling find_package for cudf. It should set up all these include paths. I'll try to repro this locally with @karthikeyann and make a suggestion.
We don't want to require a specific CUDA version just to get a particular CCCL version -- those don't always move in lockstep and sometimes RAPIDS requires CCCL versions that have been publicly released but are not yet shipped in a CUDA toolkit.
There was a problem hiding this comment.
@bdice thanks for the inputs. Yes, I think we ran into version mismatch for rapids rmm and cuda. Initially in Gluten we prepared a docker env with cuda-12.8 pre-installed, everything works and the CMake piece was targeting for the old env. However with the recent cudf-25.12 change, It does not compile, hence I'm experimenting on how to fix this. In my local env, i will need to bump to use cuda-13.1 otherwise there will be issues on some header definition. I also tried cuda-12.9 and cuda-13.0 - does not work
/__w/incubator-gluten/incubator-gluten/dev/../ep/build-velox/build/velox_ep/_build/release/_deps/rmm-src/cpp/include/rmm/detail/cuda_memory_resource.hpp:23:49: error: 'synchronous_resource_with' is not a member of 'cuda::mr'
23 | inline constexpr bool resource_with = cuda::mr::synchronous_resource_with<Resource, Properties...>;
4129e03 to
2ef5147
Compare
2ef5147 to
948e987
Compare
This reverts commit 648e2f2.
55caa57 to
731609b
Compare
This reverts commit 85a2064.
Signed-off-by: Yuan <yuanzhou@apache.org>
0607115 to
faeb163
Compare
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
| echo "enable GPU support." | ||
| COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON -DCMAKE_CUDA_ARCHITECTURES=70 \ | ||
| -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc" | ||
| COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON -DCMAKE_CUDA_ARCHITECTURES=75 \ |
There was a problem hiding this comment.
for cuda-13.1 it supports 75 at minimal
What changes are proposed in this pull request?
Fix GPU build by
gcc-14cuda-toolkit-13.1The new
cuda-toolkit-13.1requires larger disk spaces, so this patch also modified GHA to clean up the disk space firstlyHow was this patch tested?
pass GHA
fixes: #11302