-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Platform: OLCF Summit
Versions: STAT from spack: spack install stat%gcc@10.2.0 cxxflags=--std=c++14
==> 1 installed package
-- linux-rhel8-power9le / gcc@10.2.0 ----------------------------
wn2frxd stat@4.1.0%gcc cxxflags="--std=c++14" ~dysect~examples~fgfs~gui
3ra646m boost@1.77.0%gcc cxxflags="--std=c++14" +atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=98 patches=93f4aad8f88d1437e50d95a2d066390ef3753b99ef5de24f7a46bc083bd6df06 visibility=hidden
4ucshfz dyninst@10.1.0%gcc cxxflags="--std=c++14" ~ipo+openmp~stat_dysect~static build_type=RelWithDebInfo
bp7lk52 elfutils@0.172%gcc cxxflags="--std=c++14" ~bzip2~debuginfod+nls~xz
5kqtqyt intel-tbb@2020.3%gcc cxxflags="--std=c++14" ~ipo+shared+tm build_type=RelWithDebInfo cxxstd=default patches=62ba015ebd1819c45bef47411540b789b493e31ca668c4ff4cb2afcbc306b476,ce1fb16fb932ce86a82ca87cf0431d1a8c83652af9f552b264213b2ff2945d73,d62cb666de4010998c339cde6f41c7623a07e9fc69e498f2e149821c0c2c6dd0
qizwje7 libiberty@2.33.1%gcc cxxflags="--std=c++14" +pic
7lrjx2k graphlib@3.0.0%gcc cxxflags="--std=c++14" ~ipo build_type=RelWithDebInfo
j56c46j graphviz@2.49.0%gcc cxxflags="--std=c++14" ~doc~expat~ghostscript~gtkplus~gts~java~libgd~pangocairo~poppler~qt~quartz~x
7zttv3a zlib@1.2.11%gcc cxxflags="--std=c++14" +optimize+pic+shared
42awyk6 launchmon@master%gcc cxxflags="--std=c++14"
ehifwhj libgcrypt@1.9.3%gcc cxxflags="--std=c++14"
nfkm5sn libgpg-error@1.42%gcc cxxflags="--std=c++14"
xkkejlv mrnet@5.0.1-3%gcc cxxflags="--std=c++14" ~lwthreads
cc2ohrr python@3.6.13%gcc cxxflags="--std=c++14" +bz2+ctypes+dbm~debug+libxml2+lzma+nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib
p4xaimr swig@4.0.2%gcc cxxflags="--std=c++14"
kjydyg7 pcre@8.44%gcc cxxflags="--std=c++14" ~jit+multibyte+utf
I was trying to collect/compare backtraces for ten core files with a command like this:
stat-core-merger -x =bedrock -F stdout -c /gpfs/alpine/csc332/scratch/${USER}/quintain-cores/
after fixing up python's string/bye challenges (maybe I goofed that!) , the command hangs. Running with -L debug shows me
115 core_file_merger:589 VERBOSE (MainThread) Processing started at 2022-02-17 09:43:54.919282
merging 10 trace files 000%115 core_file_merger:352 INFO (MainThread) Connecting gdb to the core file (/gpfs/alpine/csc332/scratch/robl/quintain-cores//core.2)
1226 core_file_merger:379 DEBUG (MainThread) Checking for gdb errors
1601 core_file_merger:427 DEBUG (MainThread) Find a value for the current rank
When I check with ps I see STAT is trying to do this:
gdb -ex set pagination 0 -ex cd /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex path /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex directory /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex set filename-display absolute --core=/gpfs/alpine/csc332/scratch/robl/quintain-cores//core.2 /autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-bedrock-main-ibxscgvcko74xoyb6sv4lphuiv3deryo/bin/bedrock
and when I run that command myself, gdb suggests it did not process the command line arguments as expected:
% gdb -ex set pagination 0 -ex cd /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex path /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex directory /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex set filename-display absolute --core=/gpfs/alpine/csc332/scratch/robl/quintain-cores//core.2 /autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-bedrock-main-ibxscgvcko74xoyb6sv4lphuiv3deryo/bin/bedrock
Excess command line arguments ignored. (0 ...)
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
pagination: No such file or directory.
[New LWP 150624]
Core was generated by `bedrock '.
Program terminated with signal SIGINT, Interrupt.
#0 0x0000200000b76118 in ?? ()
Argument required (expression to compute).
Working directory /ccs/home/robl
(canonically /autofs/nccs-svm1_home1/robl).
Executable and object file path: /sw/summit/xalt/1.2.1/bin:/sw/sources/lsf-tools/2.0/summit/bin:/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/gdb-10.2-zl2qphcj4naoqsp6thilh4w5kkcf7n2u/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/swig-4.0.2-p4xaimrohrzqshwsefj7heh6f3df7bya/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/pcre-8.44-kjydyg7oxoimrh47ooejkj2jtv3uke3f/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/mrnet-5.0.1-3-xkkejlv2lt7xcsb65ga4thqntzrmoz3b/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/launchmon-master-42awyk6qtdhwgsen7k3bqldrdzc2es2o/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/libgcrypt-1.9.3-ehifwhjdwrb7tmapmkylstbqvp47gu62/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/libgpg-error-1.42-nfkm5snffx46qwffiwfngffnwsql2y6u/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/graphviz-2.49.0-j56c46j34im324olozfvvcmoslfphibq/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/graphlib-3.0.0-7lrjx2kdz5rg4e5g6t33gkzko7wfbm7n/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/dyninst-10.1.0-4ucshfzv5b574jurzctlbt7w3qxmgf2i/bin:/sw/summit/gcc/10.2.0-2/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-quintain-main-nkuuhxcrvm3irrqrxctkfysukzyb2xue/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-bedrock-main-ibxscgvcko74xoyb6sv4lphuiv3deryo/bin:/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-11.1.0/spectrum-mpi-10.4.0.3-20210112-6kg6anupjriji6pnvijebfn7ha5vsqp2/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-margo-main-bt67pbipf3q56ijgm2ij7nzjnlbvhruo/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/libfabric-1.13.2-hsk4mn4hjtnv7bnfptpzwhno4kjsqhvw/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-abt-io-0.5.1-ir7rmxlx4ebamktb7xtwo5iqyyzuum4d/bin:/sw/sources/hpss/bin:/autofs/nccs-svm1_home1/robl/src/spack/bin:/opt/ibm/csm/bin:/opt/ibm/spectrumcomputing/lsf/10.1.0.11/linux3.10-glibc2.17-ppc64le-csm/etc:/opt/ibm/spectrumcomputing/lsf/10.1.0.11/linux3.10-glibc2.17-ppc64le-csm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibm/flightlog/bin:/opt/ibm/jsm/bin:/sw/sources/cgroup_tool/bin:/opt/puppetlabs/bin:/usr/lpp/mmfs/bin
Reinitialize source path to empty? (y or n)
in particular pagination: No such file or directory and Excess command line arguments ignored
If I re-run that command with all the -ex arguments quoted, gdb will give me the (gdb) prompt that the python script expects
Hacking up scripts/core_file_merger.py to add those quotes gave me the command line I expected, however it still hangs at Find a value for the current rank.
When I ctrl-c the process, the python backtrace tells me it's stuck in info threads:
Traceback (most recent call last):
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/STATmain.py", line 134, in <module>
STATmerge_main(sys.argv[1:])
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 655, in STATmerge_main
ret = merger.run()
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/stat_merge_base.py", line 314, in run
trace_object = self.trace_type(filename, self.options)
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/stat_merge_base.py", line 49, in __init__
self.traces = self.get_traces()
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 535, in get_traces
core_file.process_core()
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 428, in process_core
rank_value = self.get_function_value(gdb, 'MPI_Comm_rank', 1)
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 216, in get_function_value
lines = gdb.communicate("info threads")
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 147, in communicate
return self.readlines()
File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 128, in readlines
ch = self.subprocess.stdout.read(1).decode('utf-8')
Any suggestions for next steps?
Thanks