Discovered when looking at PIConGPU, doing a:
omnitrace -v 3 -- ./bin/picongpu
will pull in modules from libc, boost, OMPI, UCX, HIP, HSA, etc., etc.
Something like 46k functions over 270 modules.
Whereas doing a binary rewrite seems to default to only symbols defined in the main binary (in this case, ~4k functions in 1 module)
omnitrace -v 3 -o picongpu -- ./bin/picongpu
Given the sometimes fragility of dyninst, I think it would be a safer choice for both modes to default to the binary-rewrite's current behaviour, and allow the user to expand the instrumentation as desired afterwards.
Specifically for PIConGPU, doing runtime instrumentation pulls in symbols from boost, which causes dyninst to segfault.