-
Notifications
You must be signed in to change notification settings - Fork 133
Description
Summary
Since profilers distinguish between functions compiling down to the same bytecode by NOP padding and then storing the line/code-block hashes, if the code object is later modified (say by another profiler (see #347)), the earlier profiler risks losing access to profiling data from the function.
Motivation
I've been playing around with pytest-autoprofile,1 trying to see if I can use it to profile the line_profiler test suite itself, and hopefully identify pinch points in the code. It mostly worked (as in "not messing the test results up"), with the notable exception of causing tests using kernprof.ContextualProfiler to fail – which was however expected because cProfile and line_profiler would be competing for the lock on sys.monitoring (see #327).
Still, one other test also mysteriously failed:2 tests/test_explict_profile.py::test_explicit_proifler_with_in_code_enable(), which explicitly used profile.enable() and .disable() to toggle which functions the inner LineProfiler of @line_profiler.profile actually sees. Digging deeper revealed that the bug only manifests when my "test profiler" (which sees more objects than @line_profiler.profile) decorates a callable decorated by the latter, but not the other way round.
Example
Write the following file (bug_demo.py):
Click to expand
from io import StringIO
from operator import attrgetter
from textwrap import indent
from line_profiler import LineProfiler
from line_profiler._line_profiler import NOP_BYTES
prof_all = LineProfiler()
prof_some = LineProfiler()
# Now we create 4 duplicate functions which compile down to the same
# bytecode (the constant is compiled away) and set up different
# profiling for them
@prof_all # No dupes -> original bytecode preserved
def func1():
for _ in range(1):
yield 1
@prof_all # 1st dupe -> bytecode padded with 2 NOPs
@prof_some # No dupes -> original bytecode preserved
def func2():
for _ in range(2):
yield 2
@prof_all # 2nd dupe -> bytecode padded with 3 NOPs
def func3():
for _ in range(3):
yield 3
@prof_all # Since the bytecode is padded, it doesn't count as a dupe
@prof_some # 1st dupe -> bytecode padded with 2 NOPs
def func4():
for _ in range(4):
yield 4
# Check that our intuition about the bytecode padding is correct
get_raw_bytecode = attrgetter('__line_profiler_id__.func.__code__.co_code')
assert not get_raw_bytecode(func1).endswith(NOP_BYTES)
assert get_raw_bytecode(func2).endswith(NOP_BYTES * 2)
assert get_raw_bytecode(func3).endswith(NOP_BYTES * 3)
assert get_raw_bytecode(func4).endswith(NOP_BYTES * 2)
# Call each of them once and show the results
assert list(func1()) == [1] * 1
assert list(func2()) == [2] * 2
assert list(func3()) == [3] * 3
assert list(func4()) == [4] * 4
with StringIO() as sio: # func1, func2, func3, func4
prof_all.print_stats(sio)
print('@prof_all:', indent(sio.getvalue(), ' '), sep='\n\n')
with StringIO() as sio:
# Expected: func2, func4 / Actual: only func4 has profiling data
prof_some.print_stats(sio)
print('@prof_some:', indent(sio.getvalue(), ' '), sep='\n\n')Output of running it:
Click to expand
(3.13-multi-desync-fix) $ python bug_demo.py
@prof_all:
Timer unit: 1e-09 s
Total time: 2e-06 s
File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
Function: func1 at line 18
Line # Hits Time Per Hit % Time Line Contents
==============================================================
18 @prof_all # No dupes -> original bytecode preserved
19 def func1():
20 2 1000.0 500.0 50.0 for _ in range(1):
21 1 1000.0 1000.0 50.0 yield 1
Total time: 0 s
File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
Function: func2 at line 24
Line # Hits Time Per Hit % Time Line Contents
==============================================================
24 @prof_all # 1st dupe -> bytecode padded with 2 NOPs
25 @prof_some # No dupes -> original bytecode preserved
26 def func2():
27 3 0.0 0.0 for _ in range(2):
28 2 0.0 0.0 yield 2
Total time: 0 s
File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
Function: func3 at line 31
Line # Hits Time Per Hit % Time Line Contents
==============================================================
31 @prof_all # 2nd dupe -> bytecode padded with 3 NOPs
32 def func3():
33 4 0.0 0.0 for _ in range(3):
34 3 0.0 0.0 yield 3
Total time: 1e-06 s
File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
Function: func4 at line 37
Line # Hits Time Per Hit % Time Line Contents
==============================================================
37 @prof_all # Since the bytecode is padded, it doesn't count as a dupe
38 @prof_some # 1st dupe -> bytecode padded with 2 NOPs
39 def func4():
40 5 0.0 0.0 0.0 for _ in range(4):
41 4 1000.0 250.0 100.0 yield 4
@prof_some:
Timer unit: 1e-09 s
Total time: 0 s
File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
Function: func2 at line 24
Line # Hits Time Per Hit % Time Line Contents
==============================================================
24 @prof_all # 1st dupe -> bytecode padded with 2 NOPs
25 @prof_some # No dupes -> original bytecode preserved
26 def func2():
27 for _ in range(2):
28 yield 2
Total time: 3e-06 s
File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
Function: func4 at line 37
Line # Hits Time Per Hit % Time Line Contents
==============================================================
37 @prof_all # Since the bytecode is padded, it doesn't count as a dupe
38 @prof_some # 1st dupe -> bytecode padded with 2 NOPs
39 def func4():
40 5 0.0 0.0 0.0 for _ in range(4):
41 4 3000.0 750.0 100.0 yield 4
As seen in the profiling output, while @prof_some has seen func2(), the former did not receive any profiling data when the latter is run. Apropos, the discrepancies in the numbers that @prof_some and @prof_all report for func4() also shows that the current way #347 is implemented (using a Python-level set of LineProfiler objects) may have more overhead than is expected/acceptable.
Analysis
As noted in the comments in bug_demo.py, the crux of the issue is that:
func1()andfunc2()compile down to the same bytecodefunc1()is decorated by@prof_all, which then associates the original bytecode withfunc1()func2()is decorated by@prof_some, which then associates the original bytecode withfunc2()- It is however subsequently decorated by
@prof_all, which sees a duplicate, and thus replacesfunc2.__code__with a padded version - Since the padded bytecode (which
@prof_somedoesn't recognize) is the one that ends up being executed, only@prof_allreceives profiling data whenfunc2()is run, but not@prof_some.
Partial mitigation strats – and why they don't really work
Change the order of application of profiler-decorators
This is what I ended up doing for pytest-autoprofile,12 where I can count on having relatively clean environments (since it's a pytest plugin and is meant to augment test sessions) and essentially have perfect control over LineProfiler instances, imports, and more. Since I know for a fact that my "test profiler" either:
- Don't interfere with
@line_profiler.profile, - Profiles whatever
@line_profiler.profile._profilesees, or - Profiles whatever
@line_profiler.profilesees (i.e. it profiles functions skipped by@line_profiler.profile._profilewhen@line_profiler.profileis not.enabled),
I can be sure that when the "test profiler" and @line_profiler.profile._profile needs to interact, the former always sees more functions than the latter, and thus double-decoration is safe as long as the former is applied before the latter. This corresponds to the func4() case in the Example, which both profiler successfully profile – because the one applied as the inner decorator has already padded the bytecode, and said padded bytecode has no duplicates as far as the outer-decorator profiler is concerned.3
While this workaround works for this limited case, for more general applications trying to juggle multiple LineProfiler instances around, one is bound to run into problematic behaviors as shown in the Example. Such bugs are particularly nasty as they are subtle: only noticeable from the output, and if one already knows the expected result and what to look for.
Use only one "real" LineProfiler and make the others dummy objects
When writing #347 I briefly toyed around with the idea of only keeping one line_profiler._line_profiler.LineProfiler object (i.e. the C(-ython)-level profiler): instead of being a subclass thereof, line_profiler.line_profiler.LineProfiler (i.e. the Python-level profiler) would become a wrapper class, and each instance would essentially "subscribe to" profiling data of individual functions, only retrieving and filtering profiling data from the C(-ython)-level singleton profiler when requested.
This approach would've had the advantage of reducing overhead (since the trace callback wouldn't have to loop over LineProfiler instances), but the drawback was glaring: since data was only gathered at one source and later post-processed, the Python-level profilers couldn't be individually enabled and disabled for line events, and thus different wrappers couldn't be used to profile separate calls:
Click to expand
from line_profiler import LineProfiler
def func():
...
prof1 = LineProfiler()
prof2 = LineProfiler()
wrapper1 = prof1(func)
wrapper2 = prof2(func)
wrapper1()
wrapper2()
# Both profilers would report 2 calls,
# since there's only one underlying Cython `LineProfiler`
# which both take data from
prof1.print_stats()
prof2.print_stats()Hence the quick abandonment of this approach and the complete rewriting of the PR.
Possible fix
No concrete ideas as of now – otherwise I would've written a PR directly.
One possible angle of attack may be to slightly alter the semantics of LineProfiler.dupes_map: dict[bytes, list[CodeType]] # key: unpadded code.co_code, value items: unpadded code object, which isn't really used other than for its length (so it might as well be replaced with a dict[bytes, int]). Say, if we were to:
- Keep track of all
LineProfilerinstances somewhere, - Instead of storing the unpadded code objects in
.dupes_map, store the padded ones (i.e. the ones actually written to the functions'.__code__), - Do extra bookkeeping in
LineProfiler.add_function()so as to sync the instances, populating their._c_code_maps,.code_hash_maps, and.dupes_maps with updated hashes and code objects.
The change to .dupes_map will be necessary for the retrieval of "affected" LineProfiler instances. Of course there is always the option of using a new (maybe private) table for that, but since it's already here and is almost what we need...
But this will probably incur some overhead and will need a ton of polishing and optimization to rein it in.
Related issues
Since the function object's .__code__ is subject to change, any given line can be associated with multiple line hashes. There are two solutions:
- Keep all versions of the code object in
.code_hash_map, and combine the profiling data associated therewith inLineProfiler.get_stats(). This overlaps with issue Timing entries not combined between duplicate code objects #348 and will require the fix in FIX: aggregating profiling data from duplicate code objects #349 (or similar). - In
LineProfiler.add_function(), scrub replaced code objects from.code_hash_map, and manually migrate the profiling data (._c_code_map,._c_last_time) associated with its old line hashes over to the new line hashes.
Footnotes
-
Disclaimer: I wrote the tool. Just wanted to contextualize what is happening here – hopefully this doesn't sound like a plug. ↩ ↩2
-
The bug has been resolved in the most recent release, but I encountered it when writing it. See however the attached Example which shows the problem. ↩ ↩2
-
This peculiarity also means that the bug would not have occurred if both profilers are always applied in lockstep to the same functions. ↩