DWARF debug support by scpmw · Pull Request #6 · scpmw/ghc

scpmw · 2014-03-13T17:38:25Z

Currently under consideration as the basis for implementing stack traces. See ticket for further discussion:

http://hackage.haskell.org/trac/ghc/ticket/3693

Duplicate record fields would not be detected when given a type with multiple data constructors, and the first data constructor had a record field r1 and any consecutive data constructors had multiple fields named r1. This fixes #9156 and was reviewed in https://phabricator.haskell.org/D87

This patch fixes Trac #9359

Fixes Trac #9357

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

This patch was provoked by Trac #5610, which I finally got a moment to look at. In the end I added a new data type ErrUtils.Validity, data Validity = IsValid -- Everything is fine | NotValid MsgDoc -- A problem, and some indication of why with some suitable combinators, and used it where appropriate (which touches quite a few modules). The main payoff is that error messages improve for FFI type validation.

after changes in 92587bf. This problem was noticed on ghcspeed (although only by accident, unfortunately, as a change from 0 to 1 is not reported in the summary).

The general approach is to add a new field to the package database, reexported-modules, which considered by the module finder as possible module declarations. Unlike declaring stub module files, multiple reexports of the same physical package at the same name do not result in an ambiguous import. Has submodule updates for Cabal and haddock. NB: When a reexport renames a module, that renaming is *not* accessible from inside the package. This is not so much a deliberate design choice as for implementation expediency (reexport resolution happens only when a package is in the package database.) TODO: Error handling when there are duplicate reexports/etc is not very well tested. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Conflicts: compiler/main/HscTypes.lhs testsuite/.gitignore utils/haddock

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

This also removes the short-lived NO_OVERLAP pragama, and renames OVERLAP to OVERLAPS. An instance may be annotated with one of 4 pragams, to control its interaction with other overlapping instances: * OVERLAPPABLE: this instance is ignored if a more specific candidate exists * OVERLAPPING: this instance is preferred over more general candidates * OVERLAPS: both OVERLAPPING and OVERLAPPABLE (i.e., the previous GHC behavior). When compiling with -XOverlappingInstances, all instance are OVERLAPS. * INCOHERENT: same as before (see manual for details). When compiling with -XIncoherentInstances, all instances are INCOHERENT.

Summary: Today's hardware is much faster, so it makes sense to report timings with more precision, and possibly help reduce rounding-induced fluctuations in the nofib statistics. This commit increases the precision of all timings previously reported with a granularity of 10ms to 1ms. For instance, the `+RTS -S` output is now rendered as: Alloc Copied Live GC GC TOT TOT Page Flts bytes bytes bytes user elap user elap 641936 59944 158120 0.000 0.000 0.013 0.001 0 0 (Gen: 0) 517672 60840 158464 0.000 0.000 0.013 0.002 0 0 (Gen: 0) 517256 58800 156424 0.005 0.005 0.019 0.007 0 0 (Gen: 1) 670208 9520 158728 0.000 0.000 0.019 0.008 0 0 (Gen: 0) ... Tot time (elapsed) Avg pause Max pause Gen 0 24 colls, 0 par 0.002s 0.002s 0.0001s 0.0002s Gen 1 3 colls, 0 par 0.011s 0.011s 0.0038s 0.0055s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 0.005s ( 0.006s elapsed) GC time 0.014s ( 0.014s elapsed) EXIT time 0.001s ( 0.001s elapsed) Total time 0.032s ( 0.020s elapsed) Note that this change also requires associated changes in the nofib submodule. Test Plan: tested with modified nofib Reviewers: simonmar, nomeata, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D97

Summary: Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: hvr, simonmar, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D98

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Signed-off-by: Austin Seipp <austin@well-typed.com>

On Linux/i386 the 64bit `__builtin_ctzll()` instrinsic doesn't get inlined by GCC but rather a short `__ctzdi2` runtime function is inserted when needed into compiled object files. This causes failures for the four test-cases TEST="T8639_api T8628 dynCompileExpr T5313" with error messages of the kind dynCompileExpr: .../libraries/ghc-prim/dist-install/build/libHSghcpr_BE58KUgBe9ELCsPXiJ1Q2r.a: unknown symbol `__ctzdi2' dynCompileExpr: dynCompileExpr: unable to load package `ghc-prim' This workaround forces GCC on 32bit x86 to to express `hs_ctz64` in terms of the 32bit `__builtin_ctz()` (this is no loss, as there's no 64bit BSF instruction on i686 anyway) and thus avoid the problematic out-of-line runtime function. Note: `__builtin_ctzll()` is used since e0c1767 (re #9340)

This became dead with 1e87c0a and was probably just missed. I plan to re-use the freed up `mkPreludeTyConUnique 23` slot soon for a new `bigNatTyConKey` (as part of the #9281 effort)

Dead code

Not too interesting, just trying to get it out of the diff.

Doesn't make much of a difference, but keeping unused variables around seems like a bit of a waste?

This patch introduces "SourceNote" tickishs that carry a reference to the original source code. They are meant to be passed along the compilation pipeline with as little disturbance to optimization processes as possible. Generation is triggered by command line parameter -g. It's free and fits with the intended end result (generation of DWARF). Internally we say that we compile with "debugging", which is probably at least slightly confusing given the plethora of other debugging options we have. Note that this pass creates *lots* of tick nodes. We take care to remove duplicated and overlapping source ticks, which gets rid of most of them. Possible optimization could be to make Tick carry a list of Tickishs instead of one at a time. Keeping ticks from getting into the way of Core transformations is tricky, but doable. The changes in this patch produce identical Core in all cases I tested (nofib). We should probably look for a way to make a test-case out of this. Fix CoreLint problem Caused by yet another instance of failing to look through ticks

This allows having, say, HPC ticks, automatic cost centres and source notes active at the same time.

This is basically just about continuing maintaining source notes after the Core stage. Unfortunately, this is more involved as it might seem, as there are more restrictions on where ticks are allowed to show up. Design decisions: * We replace the StgTick / StgSCC constructors with a unified StgTick that can carry any tickish. * For handling constructor or lambda applications, we generally float ticks out. * Note that thank to the NonLam placement, we know that source notes can never appear on lambdas. This means that as long as we are careful to always use mkTick, we will never violate CorePrep invariants. * Where CorePrep floats out lets, we make sure to wrap them in the same spirit as FloatOut. * Detecting selector thunks becomes a bit more involved, as we can run into ticks at multiple points.

They would be unneeded at minimum.

This patch allows source notes to refer to --ddump-to-file Core dumps, so we can have debugging data refer directly to places in the Core. The implementation is slightly tricky, as we couldn't find a way to get Pretty to generate line number information for us. Instead, we now generate "annotations" into the dump that get stripped out later, yielding line numbers as we go along.

These tickishs are meant to carry the (simplified and prepared) Core through the later compilation stages. Notes: * Core notes are only useful in certain scenarios (mostly profiling), and will end up taking up significant space in object files. We therefore use another GHC flag (-fsave-core) to decide whether we annotate them or not. * Annotations happen after CorePrep. This is slightly tricky, as CoreToStg moves ticks around even after this point. We have to be careful to ensure ticks end up where we intend them to be. * We take the easy route to just "point" into the Core code directly. This is slightly awkward given that Core is normally a more stright- forward data structure. We have to short-circuit Eq/Ord, for example. * We only annotate the interesting control flow points, which are either top-level or let binding bodies as well as case branches. * In order to establish an identity and later perform sub-expression checks, we safe a binder (of the binding or case) and the case constructor (if applicable, otherwise __DEFAULT).

This patch adds CmmTick nodes to Cmm code. On their own these ticks are not useful yet, as there will be many blocks that lack annotation - and we have no way of deriving them. Notes: * We use this design over, say, putting ticks into the entry node of all blocks, as it seems to work better alongside existing optimisations. Now granted, the reason for this is that currently GHC's main Cmm optimisations seem to mainly reorganize and merge code, so this might change in the future. * We have the Cmm parser generate a few source notes as well. This is relatively easy to do - worst thing is that it blows up the CmmParse implementation a bit.

This patch solves the scoping problem of CmmTick nodes: If we just put CmmTicks into blocks we have no idea what exactly they are meant to cover. Here we introduce nested scopes, represented as lists of uniques. The "nesting" relation is given by the subset relation. For example a tick declared in a block with, say, scope [b,a] now scopes over all blocks that have at least a tick scope of [b,a], so for example also [c,b,a]. Notes: * This makes it easy to express most optimisations: It is both easy to generate new blocks that share all ticks with existing blocks. It is especially possible to merge blocks to have combined contexts, simply by merging the scope lists. If this happens, we actually end up with an (acyclic) scope graph instead. * Given that the code often passes Cmm around "head-less", we have to make sure that its intended scope does not get lost. To keep the amount of passing-around to a minimum we define a CmmAGraphScoped type synonym here that just bundles the scope with a portion of Cmm to be assembled later. * We introduce new scopes at somewhat random places, aligning with getCode calls. This works surprisingly well, but we might have to add new scopes into the mix later on if we find things too be too coarse-grained.

This is meant as a tool for the debugger to determine past values of registers, most critically the stack pointer Sp. * We declare yet another new constructor for CmmNode - and this time there's actually little choice, as unwind information can and will change mid-block. We don't actually make use of these capabilities, and back-end support would be tricky (generate new labels?), but it feels like the right way to do it. * Even though we only use it for Sp so far, we allow CmmUnwind to specify unwind information for any register. This is pretty cheap and could come in useful in future. * We allow full CmmExpr expressions for specifying unwind values. The advantage here is that we don't have to make up new syntax, and can e.g. use the WDS macro directly. On the other hand, the back-end will now have to simplify the expression until it can sensibly be converted into DWARF byte code - a process which might fail, yielding NCG panics. On the other hand, when you're writing Cmm by hand you really ought to know what you're doing.

The purpose of the Debug module is to collect all required information to generate debug information (DWARF etc.) in the back-ends. Our main data structure is the "debug block", which carries all information we have about a block of code that is going to get produced. Notes: * Debug blocks are arranged into a tree according to tick scopes. This makes it easier to reason about inheritance rules. Note however that tick scopes are not guaranteed to form a tree, in which case we end up discarding some information here. This is however not too relevant in realistic scenarios, I feel. * This is also where we decide what source location we regard as representing a code block the "best". The heuristic is basically that we want the most specific source reference that comes from the same file we are currently compiling. This seems to be the most useful choice in my experience. * We are careful to not be too lazy so we don't end up breaking streaming. Debug data will be kept alive until the end of codegen, after all. * We change native assembler dumps to happen right away for every Cmm group. This simplifies the code somewhat and is consistent with how pretty much all of GHC handles dumps with respect to streamed code.

This generates DWARF, albeit indirectly using the assembler. This is the easiest (and, apparently, quite standard) method of generating the .debug_line DWARF section. Notes: * Note we have to make sure that .file directives appear correctly before the respective .loc. Right now we ppr them manually, which makes them absent from dumps. Fixing this would require .file to become a native instruction. * We have to pass a lot of things around the native code generator. I know Ian did quite a bit of refactoring already, but having one common monad could *really* simplify things here... * To support SplitObjcs, we need to emit/reset all DWARF data at every split. We use the occassion to move split marker generation to cmmNativeGenStream as well, so debug data extraction doesn't have to choke on it.

This is where we actually make GHC emit DWARF code. The info section contains all the general meta information bits as well as an entry for every block of native code. Notes: * We need quite a few new labels in order to properly address starts and ends of blocks.w1 * There is no DWARF language ID for Haskell, so we arbitrarily choose a number derived from dW_LANG_lo_user and 'hs'. This feels like the right thing to do, even though sometimes DWARF tools get confused by any unknown value in this field. * Thanks to Nathan Howell for taking the iniative to get our own Haskell language ID for DWARF! Mac OS port

This is telling debuggers such as GDB how to "unwind" a program state, which allows them to walk the stack up. Notes: * The code is quite general, perhaps unnecessarily so. Unless we get more unwind information, only the first case of pprSetUnwind will get used - and pprUnwindExpr and pprUndefUnwind will never be called. It just so happens that this is a point where we can get a lot of features cheaply, even if we don't use them. * When determining what location to show for a return address, most debuggers check the map for "rip-1", assuming that's where the "call" instruction is. For tables-next-to-code, that happens to always be the end of an info table. We therefore cheat a bit here by shifting .debug_frame information so it covers the end of the info table, as well as generating a .loc directive for the info table data. Debuggers will still show the wrong label for the return address, though. Haven't found a way around that one yet.

The conversion to DWARF is always lossy, so we put all the extra bits of information into an extra object file section (.debug-ghc). Notes: * We use the eventlog format. This might seem like a slightly arbitrary choice, but makes it easy to copy debug data into eventlogs later in order to do profiling. In the meantime, it's well-defined and extensible, so until we run out of record IDs there's no strong reason against it either. * Core notes now cause the complete Core to be copied. We are reasonably smart about this: We never emit a piece of Core twice, and use a compact binary representation for most Core constructors. On the other hand, we just pretty-print types as well as names and emit them as strings. This can sometimes lead to packets becoming too large for the eventlog format to handle (we had types break the 20k loc mark). In order to not run into these kinds of problems, we just omit packets that are longer than a certain threshold. * The amount of data generated here is significant. We therefore use faily low-level generation code using memory buffers. Furthermore, we include the data as a string, escaped using another well-optimized low-level routine. All this might make it hard to read debug data in the assembly, but is absolutely required for debugging not to become a significant resource hog. * The eventlog IDs used here were chosen primarily to avoid collisions. If this code gets merged they should be adjusted appropriately.

This is an example of how to set more complicated unwind rules - here for returnToSched & co. Note that we have to work around a few issues here: The unwind declaration needs to be the first node in the block, so we move SAVE_THREAD_STATE accordingly. That's ugly - a system that handles unwind rules in the middle of a block would be better.

This sets up the infrastructure for sample-based profiling. Namely, we now read the debug information from .debug_ghc and associate them with the (relocated) IP ranges. Furthermore, we generate stub debug data for symbol tables, which allows debugging tools to identify e.g. procedures from linked C code This patch also sets up everything needed for actually emitting samples. We try to be very general here - samples *could* for example become cost-centres if we want to support cost-centre based profiling at some point down the line.

This is one of the cheapest possible ways to get profiling data: The nursery grows block-wise, with each block getting requested separately after the last block filled up. We simply fill an array with the sources of these requests, and get a nice overview of allocation hot spots out of it.

This casts heap profiling as a source of IP samples. This works because the closure header pointers are code pointers at the saem time - so by identifying them we get a good view of memory residency.

This is a bit of an experiment - we can theoretically re-use the heap allocation profiling facilities for identification of instances where we allocate a lot of stack space (enough to warrant requesting new blocks!). This is however less useful due to the fact that the allocation of new stack blocks is, in fact, quite rare. We might have to decrease stack chunk size (-kc?) to get anything useful out of it. Also the implementation is pretty hacky...

Now it actually works for multi-threaded programs and pauses correctly for garbage collections. The way the code is distributed between rts/Timer.c and rts/posix/Itimer.c is a bit awkward, might need more work.

Simon Marlow doesn't like this approach, but at this point I am getting quite fed up with having to set LD_LIBRARY_PATH for every single Haskell program...

Tarrasch · 2014-08-31T08:48:44Z

Maybe we should close this in favor of the differential patch?

https://phabricator.haskell.org/D169

tibbe and others added 30 commits July 23, 2014 21:03

Add missing memory fence to atomicWriteIntArray#

fc53ed5

Use the right kinds on the LHS in 'deriving' clauses

6ce708c

This patch fixes Trac #9359

Check for boxed tau types in the LHS of type family instances

a997f2d

Fixes Trac #9357

[backpack] Rewrite compilation to be cleaner.

2070a8f

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Update test suite output

dae46da

after changes in 92587bf. This problem was noticed on ghcspeed (although only by accident, unfortunately, as a change from 0 to 1 is not reported in the summary).

Fix build on OS X due to macro-like string in comment

9487305

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Comments only

5dc0cea

Support ghc-pkg --ipid to query package ID.

ba00258

Summary: Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: hvr, simonmar, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D98

Add reexported modules to the list of IPID fields.

546029e

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Don't call installed package IDs 'package IDs'; they're different.

a62c345

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

rts: delint/detab/dewhitespace EventLog.c

34d7d25

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace GetEnv.c

426f2ac

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace GetTime.c

cebd37f

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace Itimer.c

d72f3ad

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace OSMem.c

b1fb531

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace OSThreads.c

3e0e489

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace TTY.c

875f4c8

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace Signals.h

22308d7

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace Signals.c

386ec24

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace Select.c

ded5ea8

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace win32/AsyncIO.c

3021fb7

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace win32/AsyncIO.h

fdcc699

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace win32/AwaitEvent.c

b64958b

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace win32/ConsoleHandler.c

ab24d0b

Signed-off-by: Austin Seipp <austin@well-typed.com>

rts: delint/detab/dewhitespace win32/GetEnv.c

20b506d

Signed-off-by: Austin Seipp <austin@well-typed.com>

hvr and others added 28 commits August 17, 2014 13:09

Remove obsolete digitsTyConKey :: Unique

96d0418

This became dead with 1e87c0a and was probably just missed. I plan to re-use the freed up `mkPreludeTyConUnique 23` slot soon for a new `bigNatTyConKey` (as part of the #9281 effort)

Remove TickBoxOp

9e7c247

Dead code

Refactor common_block

14d8c7e

Not too interesting, just trying to get it out of the diff.

Only define profiling options for profiling build

cdac0ad

Doesn't make much of a difference, but keeping unused variables around seems like a bit of a waste?

Generalized Coverage pass to allow adding multiple types of Tickishs

97e9160

This allows having, say, HPC ticks, automatic cost centres and source notes active at the same time.

Strip source ticks from iface code if DWARF is disabled

c960a48

They would be unneeded at minimum.

DWARF reading support for RTS

0525de3

IP sample-based heap profiling

d32b1ae

This casts heap profiling as a source of IP samples. This works because the closure header pointers are code pointers at the saem time - so by identifying them we get a good view of memory residency.

Perf_event profiling

22c85db

Adding timer-based profiling [WIP]

997b007

Now it actually works for multi-threaded programs and pauses correctly for garbage collections. The way the code is distributed between rts/Timer.c and rts/posix/Itimer.c is a bit awkward, might need more work.

Set dynamic library search path for libdwarf

830f6e7

Simon Marlow doesn't like this approach, but at this point I am getting quite fed up with having to set LD_LIBRARY_PATH for every single Haskell program...

scpmw force-pushed the profiling-import branch from 3dc907a to 830f6e7 Compare August 21, 2014 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DWARF debug support#6

DWARF debug support#6
scpmw wants to merge 5086 commits intomasterfrom
profiling-import

scpmw commented Mar 13, 2014

Uh oh!

Tarrasch commented Aug 31, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

scpmw commented Mar 13, 2014

Uh oh!

Tarrasch commented Aug 31, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants