feat(hesai): add CUDA-accelerated point cloud decoder#421
Draft
k1832 wants to merge 2 commits intotier4:mainfrom
Draft
feat(hesai): add CUDA-accelerated point cloud decoder#421k1832 wants to merge 2 commits intotier4:mainfrom
k1832 wants to merge 2 commits intotier4:mainfrom
Conversation
Add a GPU decode path for Hesai LiDAR sensors, gated behind compile-time BUILD_CUDA=ON and runtime NEBULA_USE_CUDA=1 environment variable. The implementation includes: - CUDA kernel for batched point cloud decoding (hesai_cuda_kernels.cu) - Angle LUT upload and GPU scan buffer management in hesai_decoder.hpp - GPU-vs-CPU equivalence tests for OT128 (Pandar128E4X) sensor The GPU path processes an entire scan in a single kernel launch, using pre-computed angle lookup tables and a sparse output buffer. When CUDA is not available or NEBULA_USE_CUDA is unset, the existing CPU path is used with zero overhead.
- Copyright year 2024 -> 2026 for new files - Replace deprecated find_package(CUDA) with find_package(CUDAToolkit) - Remove --expt-relaxed-constexpr flag (not needed) - Remove unused per-packet kernel and launcher (dead code) - Batch launcher returns bool; caller logs via NEBULA_LOG_STREAM - Reorder CudaNebulaPoint fields for better memory packing - Remove redundant is_multi_frame member; use n_frames > 1 - Make HesaiCudaDecoder destructor virtual - Add int32_t range guarantee comment in angle corrector
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Type
Related Links
Description
Add a GPU-accelerated decode path for Hesai LiDAR sensors using CUDA. The feature is:
-DBUILD_CUDA=ON. When CUDA toolkit is not found, the build silently falls back to CPU-only.NEBULA_USE_CUDA=1environment variable. When unset, the existing CPU path is used with zero overhead.What it does
launch_decode_hesai_scan_batch)Files changed
hesai_cuda_kernels.cuhesai_cuda_decoder.hpphesai_decoder.hpphesai_sensor.hppmax_scan_buffer_points()for GPU buffer sizingangle_corrector_*.hppnebula_hesai_decoders/CMakeLists.txtnebula_hesai/CMakeLists.txthesai_cuda_decoder_test.cppKnown limitations
return_typefield (always 0)Review Procedure
Build (with CUDA)
Requires NVIDIA CUDA Toolkit (tested with CUDA 12.x). If the toolkit is not found, the build succeeds but CUDA support is silently disabled.
Running with CUDA enabled
The GPU decode path is gated by a runtime environment variable:
Test
Test results
Remarks
BUILD_CUDA=OFF), the 5 CUDA tests are compiled but skip at runtime viaGTEST_SKIP(), so they do not break CPU-only CI.Pre-Review Checklist for the PR Author
PR Author should check the checkboxes below when creating the PR.
Checklist for the PR Reviewer
Reviewers should check the checkboxes below before approval.
Post-Review Checklist for the PR Author
PR Author should check the checkboxes below before merging.
CI Checks