Hardware info plugin #391

amartyads · 2025-12-31T14:55:04Z

Description

I've created the plugin I was talking about in #348, to show hardware info once a simulation has started. This helps double check whether the distribution of resources was correct, whether pinning was successful, whether a heterogeneous hardware run was launched properly etc. For higher versions of glibc, this also shows NUMA information.

Resolved Issues

Plugin to show hardware info #348

How Has This Been Tested?

Tested on

Local machine, with higher version of glibc, serial and parallel, clang and gcc, intel
HSUper, with 16 nodes, various permutations of ranks and threads, lower glibc, icpx and gcc, intel

Documentation

(Only relevant if this PR introduces new features)

all-options.xml documents how to use the feature.
The responsible readXML() documents how to use the feature.

Discussion points

A few points I'd like everyone's input on, hence initially making this PR draft.

I didn't add software information (ls1 version, compiler version, mpi version etc) because that would replicate data already present in the first few lines of ls1 output.
I was going to add hardware information (cpu model and speed, max memory available) per node, but then ran into several roadblocks. I want to talk about them here.
1. Library: The easiest solution would be to use a library like hwinfo, but I don't know how others feel about introducing an external dependency for a plugin.
2. Header file: The header file cpuid.h only works for linux systems, and does not have RAM info; to get max available RAM, you need to do sysconf(_SC_PAGESIZE) * sysconf(_SC_PHYS_PAGES) from unistd.h. I'm already using unistd (despite only existing on posix systems), so that's not a big deal, but it's not the most elegant solution. This does work on intel, ARM, AMD, with both gcc and clang-based compilers. Adding headers for windows systems would add a lot more clutter.
3. System counters: On linux only, I can parse /proc/cpuinfo and /proc/meminfo to get the information, but cpuinfo doesn't give clean values on ARM, it gives a code that you need to then look up in the documentation to find the CPU model. So this works only on linux, non ARM (intel and AMD work fine).
4. Command execution: I can just run lscpu with popen() on linux and then parse the output, but do we want to do that?
  From all these options, 2 seems the best to me, but I want to know what everyone else thinks. 2,3 and 4 are all linux specific.
If any header is unavailable, I've added #ifdef checks. But I've made it so that unavailable values are -1, and are not written if unavailable. I didn't put all the variables and writing code in #ifdefs because I thought it would be too cluttered. Do I do it anyway, or do I stick to -1?

I would also really appreciate it if someone with a different hardware setup tested out this plugin to make sure it's working. I couldn't do it on ARM and AMD yet because the cmake version on the minicluster is too low; I can try and run it there but it would be more convenient if someone has a set up ready to go.

…nt const

rubenhorn

Looks good so far.
I've added a few comments about possible Windows core library features that could be used.
In general, I would be OK with having POSIX/Linux exclusive features, since that's what most HPC systems (all?) use, anyway.
Therefore, /proc/* could be used, as the Linux file system is already required for the EnergyRAPL plugin.

I would advise against parsing command execution output and consistency across linux systems (Intel, AMD, ARM) is mandatory.
You could add almost all functionality for Windows using windows.h and by reading the Registry, but I'm also not sure if it is worth the additional code clutter.
(Memory in bytes is provided by GlobalMemoryStatusEx.ullTotalPhys.)

src/plugins/HardwareInfo.cpp

amartyads · 2026-01-05T12:15:44Z

Looks good so far. I've added a few comments about possible Windows core library features that could be used. In general, I would be OK with having POSIX/Linux exclusive features, since that's what most HPC systems (all?) use, anyway. Therefore, /proc/* could be used, as the Linux file system is already required for the EnergyRAPL plugin.

Okay, I guess I'll use the header implementation for now, test it, and if there are issues, move onto parsing /proc/*.

The windows stuff; you're right, we have no Windows HPC systems to test on, so if I add functionality for windows, we'll have untested code. I'll skip windows for now then, and in the future, if someone needs windows, they can easily implement the code that you've written.

cniethammer

I do not know if the SysMon plugin could handle this but I assume you had a look at it and it was not possible.

Here are some quick comments about the code from my side.

src/plugins/HardwareInfo.cpp

cniethammer · 2026-01-06T10:31:47Z

src/plugins/PinningInfo.cpp

+#ifdef __GLIBC__
+#include <sched.h>	// sched_getcpu(), getcpu(int*, int*)
+#endif


Suggested change

#ifdef __GLIBC__

#include <sched.h> // sched_getcpu(), getcpu(int*, int*)

#endif

#define _GNU_SOURCE

#include <sched.h> // sched_getcpu(), getcpu(int*, int*)

Sorry, I'm not too well versed in this; is it not possible that we use a compiler without glibc? For example, LLVM is supposed to come out with its own library, and it might have different functionality. On other OSs it's possible to use a different library.

Also, the extensions provided by _GNU_SOURCE aren't used anywhere else, is it necessary to have it?

I genuinely don't know what the standard is, and I can't seem to find any comprehensive documentation.

src/plugins/HardwareInfo.cpp

cniethammer · 2026-01-07T10:38:25Z

src/plugins/HardwareInfo.cpp

+	rankInfo << "\n\t\t\"" << _rank << "\": {\n";
+	rankInfo << "\t\t\t\"node_name\": \"" << _nodeName << "\",\n";
+	rankInfo << "\t\t\t\"total_threads\": \"" << _threadData[0].totalThreads << "\",\n";
+	rankInfo << "\t\t\t\"thread_data\": {\n";


JSON output formating should be done by tools, not manually.
Also number values in JSON should not be quoted, e.g., totalTreads

Suggested change

rankInfo << "\n\t\t\"" << _rank << "\": {\n";

rankInfo << "\t\t\t\"node_name\": \"" << _nodeName << "\",\n";

rankInfo << "\t\t\t\"total_threads\": \"" << _threadData[0].totalThreads << "\",\n";

rankInfo << "\t\t\t\"thread_data\": {\n";

rankInfo << "\"" << _rank << "\": {";

rankInfo << "\"node_name\": \"" << _nodeName << "\",";

rankInfo << "\"total_threads\": " << _threadData[0].totalThreads << ",";

rankInfo << "\"thread_data\": {";

Is it desirable to have an external tool just for a plugin?
I could write my own formatter inside the plugin, but that's code that's not strictly related to the plugin's purpose.
Or I could pivot back over to CSV or TSV.

src/plugins/HardwareInfo.cpp

amartyads · 2026-01-07T11:45:45Z

I do not know if the SysMon plugin could handle this but I assume you had a look at it and it was not possible.

Here are some quick comments about the code from my side.

The sysmon plugin does have a lot of hardware data, but the focus of this plugin was more to print the MPI and openMP info, with the node names, NUMA domains, and pinning info, to double check that whatever pinning you've done has persisted, or for node failures, see if you used a failed node. I don't believe sysmon does that. Also, sysmon doesn't seem to print any CPU information. I think the only major overlap is that I plan to print the max available ram per node. I could remove that, remove the CPU info, and rename the plugin to PinningPrint.cpp or something, to make it clearer.

Co-authored-by: Christoph Niethammer <cniethammer@users.noreply.github.com>

amartyads · 2026-01-13T14:11:48Z

I've added json library support, everything works fine, and the time taken by the plugin with 16 nodes, 4 MPI, 18 openMP is in the order of 10^-5 seconds.

Points that are still open:

Do I keep the library support, or change to CSV/TSV and write directly to file instead of using this library?
Do I rename the plugin to PinningInfo?

SamNewcome · 2026-01-16T08:25:26Z

I've added json library support, everything works fine, and the time taken by the plugin with 16 nodes, 4 MPI, 18 openMP is in the order of 10^-5 seconds.

Points that are still open:
1. Do I keep the library support, or change to CSV/TSV and write directly to file instead of using this library?

IMHO, I think a JSON library is fine but I also have no issues with CSV.

2. Do I rename the plugin to PinningInfo?

Yes. Or something like HardwareAndPinningInfo -> The main intent is really about the pinning it seems, but hardware info is also included.

amartyads added 17 commits December 16, 2025 14:53

New HardwareInfo plugin, mpi and serial working

c1c8a99

openmp added, only printing one thread, to be fixed

3970746

openmp working

87756e2

rank 0 writing to json file

49dd29a

remove threadwise, add hierarchial output

c2d2305

removed unneccesary struct fields

11ebbc4

all ranks writing to file, json parseable

ff25240

numa info only available glibc > 2.29

313a8cd

additional def for glibc

e389c17

move thread info inside rank, add guard for sched, fix serial mode

708a2ca

chnages to includeguards, documentation

7fe92b8

const issue on icpx

a9024bf

second attempt to fix const issue on icpx

8756677

final fix to const issue on icpx, removed useless const, added releva…

e99e8ce

…nt const

clarifying comments, more warnings

9af7c77

changed name of variable

b94732d

added hardware info line in summary

943c9f1

amartyads requested review from SamNewcome, cniethammer and rubenhorn December 31, 2025 14:56

EOF newline

54ed1b8

rubenhorn reviewed Jan 4, 2026

View reviewed changes

src/plugins/HardwareInfo.cpp Outdated Show resolved Hide resolved

src/plugins/HardwareInfo.cpp Outdated Show resolved Hide resolved

src/plugins/HardwareInfo.cpp Outdated Show resolved Hide resolved

rubenhorn reviewed Jan 5, 2026

View reviewed changes

src/plugins/HardwareInfo.cpp Outdated Show resolved Hide resolved

amartyads added 4 commits January 5, 2026 13:32

now writing -1 instead of skipping fields, to make schema uniform

9af7d3a

version check fix

f670ec4

ram information working, cpu to be added

75cfb50

basic cpu info

040037a

cniethammer requested changes Jan 7, 2026

View reviewed changes

amartyads and others added 9 commits January 7, 2026 13:24

Apply suggestions from code review

bb77b67

Co-authored-by: Christoph Niethammer <cniethammer@users.noreply.github.com>

changes to threaddata for omp routines, more suggestions from review

a743f4d

fix to unsigned

4be7c34

further unsigned bugfix

00c3ecc

cpu info added

23d3024

documentation, removal of unistd to use sysinfo instead

da89a25

thread data bugfix

6782bce

fix to non openmp mode

74fdd57

added json library support

eda323b

amartyads marked this pull request as ready for review January 13, 2026 14:11

amartyads requested review from cniethammer and rubenhorn January 13, 2026 14:12

newline eof

43e4b13

amartyads added 3 commits January 16, 2026 13:14

renamed plugin

b96dc9c

alphabetical order in cmakelists

3fdbfba

Merge branch 'master' into hw-info-plugin

4ad54d3

Hardware info plugin #391

Are you sure you want to change the base?

Hardware info plugin #391

Uh oh!

Conversation

amartyads commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Resolved Issues

How Has This Been Tested?

Documentation

Discussion points

Uh oh!

rubenhorn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amartyads commented Jan 5, 2026

Uh oh!

cniethammer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cniethammer Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

amartyads Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cniethammer Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

amartyads Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amartyads commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amartyads commented Jan 13, 2026

Uh oh!

SamNewcome commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amartyads commented Dec 31, 2025 •

edited

Loading

amartyads commented Jan 7, 2026 •

edited

Loading