-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathPROF-NOTES
More file actions
423 lines (275 loc) · 13 KB
/
PROF-NOTES
File metadata and controls
423 lines (275 loc) · 13 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
TOPICS to think about:
- profile example: ~/scratch/ants.hs
http://www.reddit.com/r/haskell/comments/klr63/haskell_ants_speed_differences/
- I think we fixed #1531 - close it (test is result001)
- First: make sure HPC still works (and that the code isn't a lot
worse)
- Get the entry counts right for pattern-bound variables (need to
generate ticks in mkSelectorBinds)
- Restore the CCS in the update code
- Check that PAPs work right
- Check performance of profiled code
- with/without -prof-auto-all,
- with/without -fno-count-entries
- against GHC 7.2.1
- Find out why we can't seem to get a timer resolution better than 10ms
- Make sure that we can inline values inside _scc_. Right now in
OccurAnal we annotate everything occurring inside an _scc_ with
NoOccInfo.
- We are inlining and floating non-HNFs, which can change the shape
of the profile. e.g. inlining a CAF at its single usage site
will change the stack from <CAF,c> to <MAIN,...,c>. Example:
scc002, but put manual SCCs on yan and the lambda expression, because
-prof-auto-all adds an SCC to main which prevents the inlining.
TODO: add a test/ticket, or fix.
- how to represent recursive and mutually recursive calls. Truncate
the stack on a recursive call, as now, or use the "..." notation
from Allwood's paper?
- getting useful costs for CAFs. Also, keeping track of where a CAF
was entered from, so that we can get accurate information if a CAF
raises an exception. (foo = error "bar").
- profiling flags:
- another flag to annotate all *bindings*, not just all *functions*?
- we added -fno-count-entries (document it)
- we added -prof-auto-all, -prof-auto-top, -prof-auto-exported, -prof-no-auto
(document them)
- deprecate the old flags (fix documentation)
- Tests:
- test scc002 with various combinations of -prof-* flags and
-fprof-no-count-entries
- profiling with multiple Capabilities
- do we need to keep the XML output format?
- look into code size of profiling
- reinstate PushCC so we can compile let x = scc "foo" (C a b)
properly (attribute the constructor to foo:CCCS; don't allocate
a thunk).
- bugs to fix: #4414 (entry counts on pat binds etc.)
-----------------------------------------------------------------------------
-- Differences between evaluation and lexical scoping
f = g . h
f will not appear in the call stack where it might be expected
(see profile output for clausify for some examples of this)
Transformations:
(scc s (C x1..xn)) => C x1 .. xn (both)
Constant changes in cost attribution, no change in stack shape:
((scc s e) x) => scc s (e x) (lexical scoping only)
(scc s \x . e) => \x . scc s e (lexical scoping only)
(scc s \x . e) => \x . e (eval scoping only)
scc c x => x (eval scoping, and lex scoping iff x is not
top-level)
scc c (let x = e in e')
=> let x = scc c e in scc c e' (both)
inlining a lambda (eval scoping only, lex scoping can inline
top-level lambdas)
floating a lambda (eval scoping only)
floating non-lambda??
inlining non-lambda (neither)
-----------------------------------------------------------------------------
IMPORTANT PRINCIPLE: The call stack/graph that we see should be
independent of the evaluation order. Adding strictness does not
change the call stack/graph.
Why? Because the compiler changes strictness and evaluation order
all the time, we don't want these transformations to affect the
call stacks, which are a property of the source code.
This is reasonably straightforward to implement: in a thunk we save
the current stack, and restore it when the thunk is evaluated.
Dealing with function calls is where all the action is.
-----------------------------------------------------------------------------
BASIC IDEA
CALL ccs_app ccs_lam = ccs_app
ie. we don't change the cost centre at all when calling a function.
However, the function is expected to push a CC (or several CCs) on the
stack. A preprocessing phase copies sccs inside each lambda:
\x . e ---> \x . scc "c1" ... scc "cn" . e
where "c1" .. "cn" are the stack of sccs lexically enclosing the
lambda IN THE ORIGINAL SOURCE CODE.
We could add them on every lambda, but when there are chains of
lambdas together it's not really necessary:
\x . scc "f" \y . scc "f" . e
will behave exactly like
\x . \y . scc "f" . e
because the scc "f" between the two lambdas has no effect.
----
Advantages
- boxing is not required, and indeed is a no-op with respect to the
call graph
- Entry counts are always accurate (the only operation is pushing an
scc).
- simple implementation: no special behaviour at a function call
- many compiler transformations are unconditionally valid
see Note [transformations}
- can wrap an expression in an SCC to find its costs (not possible with
lexical scoping alone)
- can distinguish between lexically scoped SCCs and evaluation scoped SCCs
(see Note [user sccs])
----
Note [auto-all]
-auto-all can do something much more useful. e.g.
f xs = if b then g xs else h xs
where
g ys = ..
h zs = ..
could translate to
f = \xs . scc "f"
let
g = \ys . scc "f.g" ...
h = \zs . scc "f.h" ...
in
...
Since we have to put an scc on every function, we might as well make
it a useful one ("f.g" instead of just "f").
QUESTION: should this be an option?
-prof no auto sccs added (everything subsumed into callers)
-prof -scc-exports exported fns only
-prof -scc-top top-level fns only (copied into nested functions)
-prof -scc-all top-level and nested fns get different CCs
------
Note [user sccs]
Maybe there should be two kinds of scc annotation:
ESCC: evaluation cost centre; the cost of evaluating the
expression to normal form (to the extent that it is demanded)
LSCC: lexical cost centre: the cost of evaluating the expression
to normal form, and the cost of evaluating any nested functions,
each time they are called (an LSCC would be copied into each
nested function).
-----
Note [transformations]
- inlining a lambda is valid
- inlining a non-lambda is only valid as long as the expression does
not move inside any sccs (changes cost attibution, possibly dramatically)
- floating out lambdas is valid
- floating out non-lambdas should collect sccs on the way out
- boxing is a no-op: it no longer matters which SCC is attached to a lambda
- scc "x" \y . e ==> \y . e
(NB. only if scc "x" is a "dupd" cost centre, that is we aren't
counting entries)
(not preserving the cost attribution, but we only change attribution
for the allocation of the lambda, so don't care)
- scc "x" C a b c ==> C a b c (same as above)
- what about eta-expansion/eta-reduction??
-----------------------------------------------------------------------------
To think about: getting useful costs for CAFs.
We should attribute the costs for a CAF to the site that is evaluating
it. This just adds more information to the profile for free.
Note that the cost centres attached to lambdas in a CAF should still
just be [CAF,...] because we don't want the evaluation site of the CAF
showing up when we call functions from it.
-----------
To think about: whether we can do partial profiling.
- consistent representation of closures. There has to be some way
to know for a thunk whether it is profiled or not: if profiled
code enters an unprofiled thunk, it sets the CCCS to "unprofiled",
but if unprofiled code enters a profiled thunk, the code for the
thunk will have to set CCCS itself (a thunk knows where it saved
its CCCS).
-----------
To think about: whether we can keep track of call stacks in
GHCi. (related to partial profiling above).
If we abstract the RTS functionality that deals with cost centres so
that GHCi can use it, this should be feasible.
GHCi already has ticks on every nested function - so no need for sccs,
we just use ticks.
** Tick behaves exactly like scc (should they be the same??!) **
Interpreted thunks must save/restore the CCS. Store the CCS in every
thunk; save and restore the current CCS during thunk eval.
So the difficulty comes when interacting with compiled code.
- entering a compiled function: do not change the current CCS
(do nothing)
- entering a compiled thunk: save the current CCS on the stack,
set the CCS to "<unknown>", eval the thunk.
So this will lose information whenever we have a compiled thunk being
evaluated from interpreted code. e.g. suppose we call map and then
evaluate the resulting list:
g x = error "foo"
f xs = map g xs
main = case head (f xs) of ...
The backtrace could give information like:
main
evaluated: <unknown>
g
by traversing the real stack (after <unkonwn> we show the CCS that was
saved on the stack during evaluation of main).
So we lose the information that the call to g was from [f,map,g], but
we get the point in the source code that evaluated the thunk.
-----------------------------------------------------------------------------
[ticks and sccs]
Ticks and sccs are very similar:
- They are both "side effecting", in that they both count entries.
Compiler transformations must preserve the entry counts.
- sccs differ in that if we float some computation outside an scc we
need to annotate it with that scc (but a special,
non-entry-counting, copy of the scc).
Predicates:
count-entries :: Bool
-- count entries accurately; True for ticks/breakpoints
-- True for ordinary cost centres
-- False for "duplicated cost centres"
cost-attrib :: Bool
-- maintain cost attribution; False for ticks/breakpoints
-- True for ordinary cost centres
-- True for "duplicated cost centres"
For call stacks, we want cost-attrib=True, count-entries maybe
coverage, we want cost-attrib=False, count-entries=True
Breakpoints have the same requirements as coverage, except that we
probably want to do call stacks with breakpoints too.
Transformations:
Three core forms:
scc c E (count-entries=False, cost-attrib=True)
tick c E (count-entries=True, cost-attrib=False)
scctick c E (count-entries=True, cost-attrib=True)
------- Variables & Literals
scc c x => x -- a variable has no cost
scc c lit => lit
scctick c x => tick x -- a variable has no cost
scctick c lit => tick lit
------- Values
scc c (\x . e) => \x . e
scc c (C x1..xn) => C x1..xn
scctick c (\x . e) => tick c (\x . e)
scctick c (C x1..xn) => tick c (C x1..xn)
------- App
(tick e) e' => tick (e e') -- app-of-tick
------- Let Floating
tick c (let x = e in e')
=> let x = e in tick c e'
scc c (let x = e in e')
=> let x = scc c e in scc c e'
scctick c (let x = e in e')
=> let x = scc c e in scctick c e'
------- Case
case (tick c e) of alts -- case-of-tick
=> tick c (case e of alts)
case (scc c (case e of p -> e')) of alts
=> case scc c e of p -> case scc c e' of alts
-- case-of-scc-of-case
case (scctick c (case e of p -> e')) of alts
=> case scctick c e of p -> case scc c e' of alts
-- case-of-scctick-of-case
------- Inlining
let x = e in e'
=> e' [ e/x ] -- iff e is a value, or e does not
-- move inside an scc
Representation:
- tick boxes contain an Int (mapping from Int->SrcLoc is stored
separately)
- cost centres contain a (Module,String) (String may be derived from
the SrcLoc when we're doing auto-sccs, otherwise it is
user-supplied).
- breakpoints need to hold a bunch of free variables as arguments
OTOH, it will be hard to use case for cost centres, because we have to
make the simplifier respect the scoping rules. But we want the
scoping rules for breakpoints too, so we can track call stacks in
GHCi.
We could make a new Id for each scc, like mkTickBoxId. But:
- how to change an scctick into a tick or an scc? We can't just
flip a flag inside the IdInfo, because then the scctick Id will
still have the same unique and therefore compare equal to the
tick/scc Id. We could make a *new* unique for the scc/tick Id,
but then it won't compare equal to other scc/tick Ids generated
from the same scctick.
- we could pass an Int# argument to say whether to tick, scc, or
scctick. A bit ugly :( What happens if the arg is not a manifest
constant when we need to compile it? (very unlikely to happen,
but still).
- we have to generate the scc/tick Ids once and for all and put them
inside the scctick Id to begin with.