rocrtst: Disabled NUMA async copy build#3044
Conversation
|
@dayatsin-amd , We must ofcourse update rocrtst APIs to use latest hwloc APIs "eventually." |
Forced ENABLE_COPY_NUMA off and excluded memory_async_copy_numa.cc from build
ecc8499 to
08a1ff2
Compare
There was a problem hiding this comment.
Pull request overview
Disables building the NUMA async copy performance test in rocrtst, while also restructuring rocrtst’s CMake logic around NUMA/hwloc discovery and RPATH handling.
Changes:
- Gated NUMA async copy test registration/includes behind
ENABLE_COPY_NUMA. - Forced
ENABLE_COPY_NUMAOFF and excludedmemory_async_copy_numa.ccfrom the build. - Refactored CMake NUMA/hwloc detection and modified build/install RPATH settings.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| projects/rocr-runtime/rocrtst/suites/test_common/main.cc | Wraps NUMA async copy header include and gtest entry in #if ENABLE_COPY_NUMA. |
| projects/rocr-runtime/rocrtst/suites/test_common/CMakeLists.txt | Forces NUMA test off, filters NUMA source, and refactors NUMA/hwloc detection + RPATH configuration. |
| projects/rocr-runtime/rocrtst/suites/performance/memory_async_copy_numa.h | Wraps the NUMA test class definition behind #if ENABLE_COPY_NUMA. |
| projects/rocr-runtime/rocrtst/suites/performance/memory_async_copy_numa.cc | Wraps NUMA test implementation behind #if ENABLE_COPY_NUMA and gates a membind call behind ENABLE_COPY_NUMA_MEMBIND_NODESET. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if(NOT HAVE_HWLOC AND PKG_CONFIG_FOUND) | ||
| pkg_check_modules(HWLOC_PC hwloc) | ||
| if(HWLOC_PC_FOUND) | ||
| # Try to find the actual library file for SHARED IMPORTED target | ||
| find_library(HWLOC_LIBRARY_FILE | ||
| NAMES hwloc | ||
| HINTS ${HWLOC_PC_LIBRARY_DIRS} | ||
| NO_DEFAULT_PATH | ||
| ) |
There was a problem hiding this comment.
The hwloc pkg-config path only succeeds if find_library(... NO_DEFAULT_PATH HINTS ${HWLOC_PC_LIBRARY_DIRS}) finds a full .so file. If HWLOC_PC_LIBRARY_DIRS is empty (common when the lib is in a default system path) or the .so symlink isn’t present, HAVE_HWLOC stays FALSE and the later fallback searches only ${ROCRTST_ROOT}/thirdparty/lib (also NO_DEFAULT_PATH), leading to a hard build failure even when system hwloc is installed. Suggest adding an INTERFACE imported target fallback using ${HWLOC_PC_LIBRARIES}/${HWLOC_PC_LIBRARY_DIRS} (like the NUMA logic) and/or dropping NO_DEFAULT_PATH so system paths are searched.
| INSTALL_RPATH "$ORIGIN/../lib/rocrtst/lib;$ORIGIN/../lib/rocm_sysdeps/lib" | ||
| BUILD_RPATH "$ORIGIN/../../dist/lib/rocm_sysdeps/lib;$ORIGIN/../lib/rocrtst/lib" |
There was a problem hiding this comment.
INSTALL_RPATH/BUILD_RPATH paths look inconsistent with what this CMakeLists installs: install(DIRECTORY .../thirdparty/lib DESTINATION lib/rocrtst) places libhwloc.so.5 under $prefix/lib/rocrtst/, but the new RPATH points to $ORIGIN/../lib/rocrtst/lib (extra /lib). This will prevent the installed binary from finding the bundled hwloc when the fallback is used. Recommend adjusting the RPATH to include $ORIGIN/../lib/rocrtst (and, if needed, $ORIGIN/../lib/rocrtst/thirdparty/lib), or aligning the install destination with the RPATH.
| INSTALL_RPATH "$ORIGIN/../lib/rocrtst/lib;$ORIGIN/../lib/rocm_sysdeps/lib" | |
| BUILD_RPATH "$ORIGIN/../../dist/lib/rocm_sysdeps/lib;$ORIGIN/../lib/rocrtst/lib" | |
| INSTALL_RPATH "$ORIGIN/../lib/rocrtst;$ORIGIN/../lib/rocm_sysdeps/lib" | |
| BUILD_RPATH "$ORIGIN/../../dist/lib/rocm_sysdeps/lib;$ORIGIN/../lib/rocrtst" |
| "hwloc: membind not supported or cannot be enforced. Check errno."); | ||
| #else | ||
| if (verbosity() >= VERBOSE_STANDARD) { | ||
| std::cout << "hwloc: membind nodeset not available in linked libhwloc; skipping." |
There was a problem hiding this comment.
The log message in the #else branch is misleading: this branch is selected when ENABLE_COPY_NUMA_MEMBIND_NODESET is disabled at compile time, not necessarily because the linked hwloc lacks nodeset membind support. Recommend rewording to reflect the build-time option (or detect support at runtime and print a message based on the detection result).
| std::cout << "hwloc: membind nodeset not available in linked libhwloc; skipping." | |
| std::cout << "hwloc: membind nodeset support disabled at build time " | |
| << "(ENABLE_COPY_NUMA_MEMBIND_NODESET=0); skipping." |
| # Force disabled by default to avoid building the NUMA test source. | ||
| set(ENABLE_COPY_NUMA OFF CACHE BOOL "Build Memory_Async_Copy_NUMA test" FORCE) | ||
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | ||
|
|
||
| # Optional: enable hwloc membind nodeset call in NUMA test | ||
| set(ENABLE_COPY_NUMA_MEMBIND_NODESET OFF CACHE BOOL "Enable hwloc membind nodeset in NUMA test") | ||
|
|
There was a problem hiding this comment.
set(ENABLE_COPY_NUMA ... FORCE) makes the option impossible to enable from the command line/GUI, but the surrounding comments call it “Optional”. Also, ENABLE_COPY_NUMA/ENABLE_COPY_NUMA_MEMBIND_NODESET are used in C++ with #if but are never added as compile definitions for the rocrtst target, so turning the CMake option ON would not actually enable the code. Recommend: remove FORCE, keep a single option(), gate source filtering on it, and propagate both options via target_compile_definitions() (e.g., ENABLE_COPY_NUMA=$<BOOL:...>).
| # Force disabled by default to avoid building the NUMA test source. | |
| set(ENABLE_COPY_NUMA OFF CACHE BOOL "Build Memory_Async_Copy_NUMA test" FORCE) | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| # Optional: enable hwloc membind nodeset call in NUMA test | |
| set(ENABLE_COPY_NUMA_MEMBIND_NODESET OFF CACHE BOOL "Enable hwloc membind nodeset in NUMA test") | |
| option(ENABLE_COPY_NUMA "Build Memory_Async_Copy_NUMA test" OFF) | |
| if(NOT ENABLE_COPY_NUMA) | |
| # Exclude NUMA test source when NUMA copy tests are disabled. | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| endif() | |
| # Optional: enable hwloc membind nodeset call in NUMA test | |
| option(ENABLE_COPY_NUMA_MEMBIND_NODESET "Enable hwloc membind nodeset in NUMA test" OFF) | |
| # Propagate NUMA options to C++ sources. | |
| if(ENABLE_COPY_NUMA) | |
| add_definitions(-DENABLE_COPY_NUMA=1) | |
| else() | |
| add_definitions(-DENABLE_COPY_NUMA=0) | |
| endif() | |
| if(ENABLE_COPY_NUMA_MEMBIND_NODESET) | |
| add_definitions(-DENABLE_COPY_NUMA_MEMBIND_NODESET=1) | |
| else() | |
| add_definitions(-DENABLE_COPY_NUMA_MEMBIND_NODESET=0) | |
| endif() |
| # Always exclude memory_async_copy_numa.cc since ENABLE_COPY_NUMA is forced OFF | ||
| message(STATUS "ENABLE_COPY_NUMA is OFF - excluding memory_async_copy_numa.cc from build") | ||
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") |
There was a problem hiding this comment.
This unconditionally excludes memory_async_copy_numa.cc regardless of ENABLE_COPY_NUMA, which makes the option ineffective and can lead to link/compile mismatches if someone tries to enable the test. Recommend: only list(FILTER ...) when NOT ENABLE_COPY_NUMA (and only print the STATUS message in that case).
| # Always exclude memory_async_copy_numa.cc since ENABLE_COPY_NUMA is forced OFF | |
| message(STATUS "ENABLE_COPY_NUMA is OFF - excluding memory_async_copy_numa.cc from build") | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| # Exclude memory_async_copy_numa.cc when ENABLE_COPY_NUMA is OFF | |
| if(NOT ENABLE_COPY_NUMA) | |
| message(STATUS "ENABLE_COPY_NUMA is OFF - excluding memory_async_copy_numa.cc from build") | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| endif() |
There was a problem hiding this comment.
I concur with copilot's comment--it looks like the source file is unconditionally excluded.
There was a problem hiding this comment.
I don't think we need to exclude the .cc file. I would just have
ENABLE_COPY_NUMA_MEMBIND_NODESET and ENABLE_COPY_NUMA set to 0,
But I do not feel strongly about it. Up to you..
There was a problem hiding this comment.
Yeah, you are right.. its already a disabled test. I will work on these review comments in the next PR. I will also clean and eliminate some redundant code in this CMakeFile file as well.
| else() | ||
| message(WARNING "NUMA library not found. Building without numa support.") | ||
| message(WARNING " Install libnuma-dev (Debian/Ubuntu) or numactl-devel (RHEL/CentOS)") | ||
| endif() |
There was a problem hiding this comment.
When NUMA is not found you only emit a warning, but the build still compiles suites/performance/memory_async_copy*.cc, which include <numa.h> and call libnuma APIs. On systems without libnuma headers this will fail at compile time. Either make NUMA a required dependency (FATAL_ERROR with install hint) or keep the previous behavior of filtering out the memory_async_copy*.cc sources when NUMA isn’t available.
jharryma
left a comment
There was a problem hiding this comment.
rocrtst passing on this test PR: ROCm/TheRock#3236
71 tests ran and passed: https://github.com/ROCm/TheRock/actions/runs/21723147025/job/62672812484?pr=3236#step:10:2404
|
@dayatsin-amd @cfreehill - This patch excludes memory_async_copy_numa.cc from build. This patch has been tested successfully in TheRock's integrated build process by Jessy as confirmed here -ROCm/TheRock#3236 So instead of PR - #2598 - I would like to submit this PR. Please help provide your approval and complete the review here. Thank you! |
| endif() | ||
| set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE) | ||
|
|
||
| # Force disable NUMA copy functionality |
There was a problem hiding this comment.
Is this comment (and the one below) correct, and the message()? It looks like it's optional.
Does the message() print out whether ENABLE_COPY_NUMA is on or off?
| # Always exclude memory_async_copy_numa.cc since ENABLE_COPY_NUMA is forced OFF | ||
| message(STATUS "ENABLE_COPY_NUMA is OFF - excluding memory_async_copy_numa.cc from build") | ||
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") |
There was a problem hiding this comment.
I concur with copilot's comment--it looks like the source file is unconditionally excluded.
|
I just have minor questions/comments. If rocrtst builds and runs successfully (at least the test in question) in CI and TheRock, then I would approve since it seems to be somewhat urgent. We can address the other issues (like testing the configurability of the tests wrt numa/hwloc), if any, afterwards. |
|
|
||
| set(CMAKE_BUILD_WITH_INSTALL_RPATH ON) | ||
| # Configure RPATH settings before creating executable target | ||
| set(CMAKE_BUILD_WITH_INSTALL_RPATH OFF) |
There was a problem hiding this comment.
Why do we need to change the rpath settings?
There was a problem hiding this comment.
With CMAKE_BUILD_WITH_INSTALL_RPATH ON, TheRock build system was complaining about runtime linker failures when testing from the build directory:
./build/rocrtst64: error while loading shared libraries: libnuma.so.1: cannot open shared object file: No such file or directory
- The binary was looking for libs at build/../lib/rocm_sysdeps/lib (install paths)
- But in TheRock's build tree, numa/hwloc are actually at dist/lib/rocm_sysdeps/lib
- Same for bundled thirdparty libs - not at lib/rocrtst yet, they are only copied there during install
With OFF -BUILD_RPATH is pointing to ../../dist/lib/rocm_sysdeps/lib and finds TheRock's libs
Also when they switch to INSTALL_RPATH after make install its able to finds installed libs at /opt/rocm/lib/rocrtst
- Seemed to have worked in both scenarios without needing to install first.
Let me know if you see any more concerns.. We did a lot of iterations on this PR.. And this seems to be the Goldilocks Cmakefile!
There was a problem hiding this comment.
Around 2 years ago, there was a task group that spent a few weeks to research and refine the RPATH settings for all the components in ROCm. I was not part of that discussion, but there were people who were very experienced in packaging/deployment involved. The current RPATH settings were based on the recommendations from that group, so I am hesitant with making RPATH changes.
Given that these changes only apply to rocrtst, which is a executable and not a library, we are probably ok, so we can ignore get back to it if we start seeing problems from this.
| # Always exclude memory_async_copy_numa.cc since ENABLE_COPY_NUMA is forced OFF | ||
| message(STATUS "ENABLE_COPY_NUMA is OFF - excluding memory_async_copy_numa.cc from build") | ||
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") |
There was a problem hiding this comment.
I don't think we need to exclude the .cc file. I would just have
ENABLE_COPY_NUMA_MEMBIND_NODESET and ENABLE_COPY_NUMA set to 0,
But I do not feel strongly about it. Up to you..
Thanks @cfreehill. I would like to see TheRock build get unblocked as Step 1, since rocrtst's hwloc API link errors are blocking it. I will follow up on your review comments in the subsequent PRs. Thank you. |
Forced ENABLE_COPY_NUMA off and excluded memory_async_copy_numa.cc from build Co-authored-by: Shweta Khatri <shweta.khatri@amd.com> [rocm-systems] ROCm/rocm-systems#3044 (commit de8012a)
Forced ENABLE_COPY_NUMA off and excluded memory_async_copy_numa.cc from build Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
|
@shwetagkhatri seems like bug in declaring the dependecies |
Forced ENABLE_COPY_NUMA off and excluded memory_async_copy_numa.cc from build
Motivation
Technical Details
JIRA ID
Test Plan
Test Result
Submission Checklist