-
Notifications
You must be signed in to change notification settings - Fork 17
Hardware info plugin #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Hardware info plugin #391
Conversation
rubenhorn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far.
I've added a few comments about possible Windows core library features that could be used.
In general, I would be OK with having POSIX/Linux exclusive features, since that's what most HPC systems (all?) use, anyway.
Therefore, /proc/* could be used, as the Linux file system is already required for the EnergyRAPL plugin.
I would advise against parsing command execution output and consistency across linux systems (Intel, AMD, ARM) is mandatory.
You could add almost all functionality for Windows using windows.h and by reading the Registry, but I'm also not sure if it is worth the additional code clutter.
(Memory in bytes is provided by GlobalMemoryStatusEx.ullTotalPhys.)
Okay, I guess I'll use the header implementation for now, test it, and if there are issues, move onto parsing /proc/*. The windows stuff; you're right, we have no Windows HPC systems to test on, so if I add functionality for windows, we'll have untested code. I'll skip windows for now then, and in the future, if someone needs windows, they can easily implement the code that you've written. |
cniethammer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know if the SysMon plugin could handle this but I assume you had a look at it and it was not possible.
Here are some quick comments about the code from my side.
| #ifdef __GLIBC__ | ||
| #include <sched.h> // sched_getcpu(), getcpu(int*, int*) | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #ifdef __GLIBC__ | |
| #include <sched.h> // sched_getcpu(), getcpu(int*, int*) | |
| #endif | |
| #define _GNU_SOURCE | |
| #include <sched.h> // sched_getcpu(), getcpu(int*, int*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not too well versed in this; is it not possible that we use a compiler without glibc? For example, LLVM is supposed to come out with its own library, and it might have different functionality. On other OSs it's possible to use a different library.
Also, the extensions provided by _GNU_SOURCE aren't used anywhere else, is it necessary to have it?
I genuinely don't know what the standard is, and I can't seem to find any comprehensive documentation.
src/plugins/HardwareInfo.cpp
Outdated
| rankInfo << "\n\t\t\"" << _rank << "\": {\n"; | ||
| rankInfo << "\t\t\t\"node_name\": \"" << _nodeName << "\",\n"; | ||
| rankInfo << "\t\t\t\"total_threads\": \"" << _threadData[0].totalThreads << "\",\n"; | ||
| rankInfo << "\t\t\t\"thread_data\": {\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON output formating should be done by tools, not manually.
Also number values in JSON should not be quoted, e.g., totalTreads
| rankInfo << "\n\t\t\"" << _rank << "\": {\n"; | |
| rankInfo << "\t\t\t\"node_name\": \"" << _nodeName << "\",\n"; | |
| rankInfo << "\t\t\t\"total_threads\": \"" << _threadData[0].totalThreads << "\",\n"; | |
| rankInfo << "\t\t\t\"thread_data\": {\n"; | |
| rankInfo << "\"" << _rank << "\": {"; | |
| rankInfo << "\"node_name\": \"" << _nodeName << "\","; | |
| rankInfo << "\"total_threads\": " << _threadData[0].totalThreads << ","; | |
| rankInfo << "\"thread_data\": {"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it desirable to have an external tool just for a plugin?
I could write my own formatter inside the plugin, but that's code that's not strictly related to the plugin's purpose.
Or I could pivot back over to CSV or TSV.
The sysmon plugin does have a lot of hardware data, but the focus of this plugin was more to print the MPI and openMP info, with the node names, NUMA domains, and pinning info, to double check that whatever pinning you've done has persisted, or for node failures, see if you used a failed node. I don't believe sysmon does that. Also, sysmon doesn't seem to print any CPU information. I think the only major overlap is that I plan to print the max available ram per node. I could remove that, remove the CPU info, and rename the plugin to PinningPrint.cpp or something, to make it clearer. |
Co-authored-by: Christoph Niethammer <cniethammer@users.noreply.github.com>
|
I've added json library support, everything works fine, and the time taken by the plugin with 16 nodes, 4 MPI, 18 openMP is in the order of 10^-5 seconds. Points that are still open:
|
IMHO, I think a JSON library is fine but I also have no issues with CSV.
Yes. Or something like |
Description
I've created the plugin I was talking about in #348, to show hardware info once a simulation has started. This helps double check whether the distribution of resources was correct, whether pinning was successful, whether a heterogeneous hardware run was launched properly etc. For higher versions of glibc, this also shows NUMA information.
Resolved Issues
How Has This Been Tested?
Tested on
Documentation
(Only relevant if this PR introduces new features)
all-options.xmldocuments how to use the feature.readXML()documents how to use the feature.Discussion points
A few points I'd like everyone's input on, hence initially making this PR draft.
cpuid.honly works for linux systems, and does not have RAM info; to get max available RAM, you need to dosysconf(_SC_PAGESIZE) * sysconf(_SC_PHYS_PAGES)fromunistd.h. I'm already using unistd (despite only existing on posix systems), so that's not a big deal, but it's not the most elegant solution. This does work on intel, ARM, AMD, with both gcc and clang-based compilers. Adding headers for windows systems would add a lot more clutter./proc/cpuinfoand/proc/meminfoto get the information, but cpuinfo doesn't give clean values on ARM, it gives a code that you need to then look up in the documentation to find the CPU model. So this works only on linux, non ARM (intel and AMD work fine).lscpuwithpopen()on linux and then parse the output, but do we want to do that?From all these options, 2 seems the best to me, but I want to know what everyone else thinks. 2,3 and 4 are all linux specific.
#ifdefchecks. But I've made it so that unavailable values are -1, and are not written if unavailable. I didn't put all the variables and writing code in#ifdefs because I thought it would be too cluttered. Do I do it anyway, or do I stick to -1?I would also really appreciate it if someone with a different hardware setup tested out this plugin to make sure it's working. I couldn't do it on ARM and AMD yet because the cmake version on the minicluster is too low; I can try and run it there but it would be more convenient if someone has a set up ready to go.