Skip to content

stat-core-merger stuck communicating with gdb #35

@roblatham00

Description

@roblatham00

Platform: OLCF Summit
Versions: STAT from spack: spack install stat%gcc@10.2.0 cxxflags=--std=c++14

==> 1 installed package
-- linux-rhel8-power9le / gcc@10.2.0 ----------------------------
wn2frxd stat@4.1.0%gcc  cxxflags="--std=c++14" ~dysect~examples~fgfs~gui
3ra646m     boost@1.77.0%gcc  cxxflags="--std=c++14" +atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=98 patches=93f4aad8f88d1437e50d95a2d066390ef3753b99ef5de24f7a46bc083bd6df06 visibility=hidden
4ucshfz     dyninst@10.1.0%gcc  cxxflags="--std=c++14" ~ipo+openmp~stat_dysect~static build_type=RelWithDebInfo
bp7lk52         elfutils@0.172%gcc  cxxflags="--std=c++14" ~bzip2~debuginfod+nls~xz
5kqtqyt         intel-tbb@2020.3%gcc  cxxflags="--std=c++14" ~ipo+shared+tm build_type=RelWithDebInfo cxxstd=default patches=62ba015ebd1819c45bef47411540b789b493e31ca668c4ff4cb2afcbc306b476,ce1fb16fb932ce86a82ca87cf0431d1a8c83652af9f552b264213b2ff2945d73,d62cb666de4010998c339cde6f41c7623a07e9fc69e498f2e149821c0c2c6dd0
qizwje7         libiberty@2.33.1%gcc  cxxflags="--std=c++14" +pic
7lrjx2k     graphlib@3.0.0%gcc  cxxflags="--std=c++14" ~ipo build_type=RelWithDebInfo
j56c46j     graphviz@2.49.0%gcc  cxxflags="--std=c++14" ~doc~expat~ghostscript~gtkplus~gts~java~libgd~pangocairo~poppler~qt~quartz~x
7zttv3a         zlib@1.2.11%gcc  cxxflags="--std=c++14" +optimize+pic+shared
42awyk6     launchmon@master%gcc  cxxflags="--std=c++14"
ehifwhj         libgcrypt@1.9.3%gcc  cxxflags="--std=c++14"
nfkm5sn             libgpg-error@1.42%gcc  cxxflags="--std=c++14"
xkkejlv     mrnet@5.0.1-3%gcc  cxxflags="--std=c++14" ~lwthreads
cc2ohrr     python@3.6.13%gcc  cxxflags="--std=c++14" +bz2+ctypes+dbm~debug+libxml2+lzma+nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib
p4xaimr     swig@4.0.2%gcc  cxxflags="--std=c++14"
kjydyg7         pcre@8.44%gcc  cxxflags="--std=c++14" ~jit+multibyte+utf

I was trying to collect/compare backtraces for ten core files with a command like this:

stat-core-merger -x =bedrock -F stdout -c /gpfs/alpine/csc332/scratch/${USER}/quintain-cores/

after fixing up python's string/bye challenges (maybe I goofed that!) , the command hangs. Running with -L debug shows me

115      core_file_merger:589   VERBOSE  (MainThread) Processing started at 2022-02-17 09:43:54.919282
merging 10 trace files                                                                                                                                                                                                                                                              000%115      core_file_merger:352   INFO     (MainThread) Connecting gdb to the core file (/gpfs/alpine/csc332/scratch/robl/quintain-cores//core.2)
1226     core_file_merger:379   DEBUG    (MainThread) Checking for gdb errors
1601     core_file_merger:427   DEBUG    (MainThread) Find a value for the current rank

When I check with ps I see STAT is trying to do this:

 gdb -ex set pagination 0 -ex cd /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex path /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex directory /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex set filename-display absolute --core=/gpfs/alpine/csc332/scratch/robl/quintain-cores//core.2 /autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-bedrock-main-ibxscgvcko74xoyb6sv4lphuiv3deryo/bin/bedrock

and when I run that command myself, gdb suggests it did not process the command line arguments as expected:

%  gdb -ex set pagination 0 -ex cd /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex path /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex directory /autofs/nccs-svm1_home1/robl/src/mochi-quintain/tests -ex set filename-display absolute --core=/gpfs/alpine/csc332/scratch/robl/quintain-cores//core.2 /autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-bedrock-main-ibxscgvcko74xoyb6sv4lphuiv3deryo/bin/bedrock
Excess command line arguments ignored. (0 ...)
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
pagination: No such file or directory.
[New LWP 150624]
Core was generated by `bedrock '.
Program terminated with signal SIGINT, Interrupt.
#0  0x0000200000b76118 in ?? ()
Argument required (expression to compute).
Working directory /ccs/home/robl
 (canonically /autofs/nccs-svm1_home1/robl).
Executable and object file path: /sw/summit/xalt/1.2.1/bin:/sw/sources/lsf-tools/2.0/summit/bin:/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/gdb-10.2-zl2qphcj4naoqsp6thilh4w5kkcf7n2u/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/swig-4.0.2-p4xaimrohrzqshwsefj7heh6f3df7bya/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/pcre-8.44-kjydyg7oxoimrh47ooejkj2jtv3uke3f/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/mrnet-5.0.1-3-xkkejlv2lt7xcsb65ga4thqntzrmoz3b/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/launchmon-master-42awyk6qtdhwgsen7k3bqldrdzc2es2o/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/libgcrypt-1.9.3-ehifwhjdwrb7tmapmkylstbqvp47gu62/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/libgpg-error-1.42-nfkm5snffx46qwffiwfngffnwsql2y6u/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/graphviz-2.49.0-j56c46j34im324olozfvvcmoslfphibq/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/graphlib-3.0.0-7lrjx2kdz5rg4e5g6t33gkzko7wfbm7n/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/dyninst-10.1.0-4ucshfzv5b574jurzctlbt7w3qxmgf2i/bin:/sw/summit/gcc/10.2.0-2/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-quintain-main-nkuuhxcrvm3irrqrxctkfysukzyb2xue/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-bedrock-main-ibxscgvcko74xoyb6sv4lphuiv3deryo/bin:/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-11.1.0/spectrum-mpi-10.4.0.3-20210112-6kg6anupjriji6pnvijebfn7ha5vsqp2/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-margo-main-bt67pbipf3q56ijgm2ij7nzjnlbvhruo/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/libfabric-1.13.2-hsk4mn4hjtnv7bnfptpzwhno4kjsqhvw/bin:/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-11.1.0/mochi-abt-io-0.5.1-ir7rmxlx4ebamktb7xtwo5iqyyzuum4d/bin:/sw/sources/hpss/bin:/autofs/nccs-svm1_home1/robl/src/spack/bin:/opt/ibm/csm/bin:/opt/ibm/spectrumcomputing/lsf/10.1.0.11/linux3.10-glibc2.17-ppc64le-csm/etc:/opt/ibm/spectrumcomputing/lsf/10.1.0.11/linux3.10-glibc2.17-ppc64le-csm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibm/flightlog/bin:/opt/ibm/jsm/bin:/sw/sources/cgroup_tool/bin:/opt/puppetlabs/bin:/usr/lpp/mmfs/bin
Reinitialize source path to empty? (y or n)

in particular pagination: No such file or directory and Excess command line arguments ignored

If I re-run that command with all the -ex arguments quoted, gdb will give me the (gdb) prompt that the python script expects

Hacking up scripts/core_file_merger.py to add those quotes gave me the command line I expected, however it still hangs at Find a value for the current rank.

When I ctrl-c the process, the python backtrace tells me it's stuck in info threads:

Traceback (most recent call last):
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/STATmain.py", line 134, in <module>
    STATmerge_main(sys.argv[1:])
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 655, in STATmerge_main
    ret = merger.run()
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/stat_merge_base.py", line 314, in run
    trace_object = self.trace_type(filename, self.options)
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/stat_merge_base.py", line 49, in __init__
    self.traces = self.get_traces()
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 535, in get_traces
    core_file.process_core()
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 428, in process_core
    rank_value = self.get_function_value(gdb, 'MPI_Comm_rank', 1)
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 216, in get_function_value
    lines = gdb.communicate("info threads")
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 147, in communicate
    return self.readlines()
  File "/autofs/nccs-svm1_home1/robl/src/spack/opt/spack/linux-rhel8-power9le/gcc-10.2.0/stat-4.1.0-wn2frxd57sysvqvapa65yd5sqflvi3sr/lib/python3.6/site-packages/core_file_merger.py", line 128, in readlines
    ch = self.subprocess.stdout.read(1).decode('utf-8')

Any suggestions for next steps?
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions