Skip to content

BUG: dupe-code resolution on multiple profilers results in loss of profiling data #350

@TTsangSC

Description

@TTsangSC

Summary

Since profilers distinguish between functions compiling down to the same bytecode by NOP padding and then storing the line/code-block hashes, if the code object is later modified (say by another profiler (see #347)), the earlier profiler risks losing access to profiling data from the function.

Motivation

I've been playing around with pytest-autoprofile,1 trying to see if I can use it to profile the line_profiler test suite itself, and hopefully identify pinch points in the code. It mostly worked (as in "not messing the test results up"), with the notable exception of causing tests using kernprof.ContextualProfiler to fail – which was however expected because cProfile and line_profiler would be competing for the lock on sys.monitoring (see #327).

Still, one other test also mysteriously failed:2 tests/test_explict_profile.py::test_explicit_proifler_with_in_code_enable(), which explicitly used profile.enable() and .disable() to toggle which functions the inner LineProfiler of @line_profiler.profile actually sees. Digging deeper revealed that the bug only manifests when my "test profiler" (which sees more objects than @line_profiler.profile) decorates a callable decorated by the latter, but not the other way round.

Example

Write the following file (bug_demo.py):

Click to expand
from io import StringIO
from operator import attrgetter
from textwrap import indent

from line_profiler import LineProfiler
from line_profiler._line_profiler import NOP_BYTES


prof_all = LineProfiler()
prof_some = LineProfiler()


# Now we create 4 duplicate functions which compile down to the same
# bytecode (the constant is compiled away) and set up different
# profiling for them


@prof_all  # No dupes -> original bytecode preserved
def func1():
    for _ in range(1):
        yield 1


@prof_all  # 1st dupe -> bytecode padded with 2 NOPs
@prof_some  # No dupes -> original bytecode preserved
def func2():
    for _ in range(2):
        yield 2


@prof_all  # 2nd dupe -> bytecode padded with 3 NOPs
def func3():
    for _ in range(3):
        yield 3


@prof_all  # Since the bytecode is padded, it doesn't count as a dupe
@prof_some  # 1st dupe -> bytecode padded with 2 NOPs
def func4():
    for _ in range(4):
        yield 4


# Check that our intuition about the bytecode padding is correct

get_raw_bytecode = attrgetter('__line_profiler_id__.func.__code__.co_code')
assert not get_raw_bytecode(func1).endswith(NOP_BYTES)
assert get_raw_bytecode(func2).endswith(NOP_BYTES * 2)
assert get_raw_bytecode(func3).endswith(NOP_BYTES * 3)
assert get_raw_bytecode(func4).endswith(NOP_BYTES * 2)


# Call each of them once and show the results

assert list(func1()) == [1] * 1
assert list(func2()) == [2] * 2
assert list(func3()) == [3] * 3
assert list(func4()) == [4] * 4

with StringIO() as sio:  # func1, func2, func3, func4
    prof_all.print_stats(sio)
    print('@prof_all:', indent(sio.getvalue(), '  '), sep='\n\n')

with StringIO() as sio:
    # Expected: func2, func4 / Actual: only func4 has profiling data
    prof_some.print_stats(sio)
    print('@prof_some:', indent(sio.getvalue(), '  '), sep='\n\n')

Output of running it:

Click to expand
(3.13-multi-desync-fix)  $ python bug_demo.py 
@prof_all:

  Timer unit: 1e-09 s

  Total time: 2e-06 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func1 at line 18

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      18                                           @prof_all  # No dupes -> original bytecode preserved
      19                                           def func1():
      20         2       1000.0    500.0     50.0      for _ in range(1):
      21         1       1000.0   1000.0     50.0          yield 1

  Total time: 0 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func2 at line 24

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      24                                           @prof_all  # 1st dupe -> bytecode padded with 2 NOPs
      25                                           @prof_some  # No dupes -> original bytecode preserved
      26                                           def func2():
      27         3          0.0      0.0               for _ in range(2):
      28         2          0.0      0.0                   yield 2

  Total time: 0 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func3 at line 31

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      31                                           @prof_all  # 2nd dupe -> bytecode padded with 3 NOPs
      32                                           def func3():
      33         4          0.0      0.0               for _ in range(3):
      34         3          0.0      0.0                   yield 3

  Total time: 1e-06 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func4 at line 37

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      37                                           @prof_all  # Since the bytecode is padded, it doesn't count as a dupe
      38                                           @prof_some  # 1st dupe -> bytecode padded with 2 NOPs
      39                                           def func4():
      40         5          0.0      0.0      0.0      for _ in range(4):
      41         4       1000.0    250.0    100.0          yield 4


@prof_some:

  Timer unit: 1e-09 s

  Total time: 0 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func2 at line 24

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      24                                           @prof_all  # 1st dupe -> bytecode padded with 2 NOPs
      25                                           @prof_some  # No dupes -> original bytecode preserved
      26                                           def func2():
      27                                               for _ in range(2):
      28                                                   yield 2

  Total time: 3e-06 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func4 at line 37

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      37                                           @prof_all  # Since the bytecode is padded, it doesn't count as a dupe
      38                                           @prof_some  # 1st dupe -> bytecode padded with 2 NOPs
      39                                           def func4():
      40         5          0.0      0.0      0.0      for _ in range(4):
      41         4       3000.0    750.0    100.0          yield 4

As seen in the profiling output, while @prof_some has seen func2(), the former did not receive any profiling data when the latter is run. Apropos, the discrepancies in the numbers that @prof_some and @prof_all report for func4() also shows that the current way #347 is implemented (using a Python-level set of LineProfiler objects) may have more overhead than is expected/acceptable.

Analysis

As noted in the comments in bug_demo.py, the crux of the issue is that:

  • func1() and func2() compile down to the same bytecode
  • func1() is decorated by @prof_all, which then associates the original bytecode with func1()
  • func2() is decorated by @prof_some, which then associates the original bytecode with func2()
  • It is however subsequently decorated by @prof_all, which sees a duplicate, and thus replaces func2.__code__ with a padded version
  • Since the padded bytecode (which @prof_some doesn't recognize) is the one that ends up being executed, only @prof_all receives profiling data when func2() is run, but not @prof_some.

Partial mitigation strats – and why they don't really work

Change the order of application of profiler-decorators

This is what I ended up doing for pytest-autoprofile,12 where I can count on having relatively clean environments (since it's a pytest plugin and is meant to augment test sessions) and essentially have perfect control over LineProfiler instances, imports, and more. Since I know for a fact that my "test profiler" either:

  • Don't interfere with @line_profiler.profile,
  • Profiles whatever @line_profiler.profile._profile sees, or
  • Profiles whatever @line_profiler.profile sees (i.e. it profiles functions skipped by @line_profiler.profile._profile when @line_profiler.profile is not .enabled),

I can be sure that when the "test profiler" and @line_profiler.profile._profile needs to interact, the former always sees more functions than the latter, and thus double-decoration is safe as long as the former is applied before the latter. This corresponds to the func4() case in the Example, which both profiler successfully profile – because the one applied as the inner decorator has already padded the bytecode, and said padded bytecode has no duplicates as far as the outer-decorator profiler is concerned.3

While this workaround works for this limited case, for more general applications trying to juggle multiple LineProfiler instances around, one is bound to run into problematic behaviors as shown in the Example. Such bugs are particularly nasty as they are subtle: only noticeable from the output, and if one already knows the expected result and what to look for.

Use only one "real" LineProfiler and make the others dummy objects

When writing #347 I briefly toyed around with the idea of only keeping one line_profiler._line_profiler.LineProfiler object (i.e. the C(-ython)-level profiler): instead of being a subclass thereof, line_profiler.line_profiler.LineProfiler (i.e. the Python-level profiler) would become a wrapper class, and each instance would essentially "subscribe to" profiling data of individual functions, only retrieving and filtering profiling data from the C(-ython)-level singleton profiler when requested.

This approach would've had the advantage of reducing overhead (since the trace callback wouldn't have to loop over LineProfiler instances), but the drawback was glaring: since data was only gathered at one source and later post-processed, the Python-level profilers couldn't be individually enabled and disabled for line events, and thus different wrappers couldn't be used to profile separate calls:

Click to expand
from line_profiler import LineProfiler
def func():
    ...

prof1 = LineProfiler()
prof2 = LineProfiler()

wrapper1 = prof1(func)
wrapper2 = prof2(func)

wrapper1()
wrapper2()

# Both profilers would report 2 calls,
# since there's only one underlying Cython `LineProfiler`
# which both take data from
prof1.print_stats()
prof2.print_stats()

Hence the quick abandonment of this approach and the complete rewriting of the PR.

Possible fix

No concrete ideas as of now – otherwise I would've written a PR directly.

One possible angle of attack may be to slightly alter the semantics of LineProfiler.dupes_map: dict[bytes, list[CodeType]] # key: unpadded code.co_code, value items: unpadded code object, which isn't really used other than for its length (so it might as well be replaced with a dict[bytes, int]). Say, if we were to:

  • Keep track of all LineProfiler instances somewhere,
  • Instead of storing the unpadded code objects in .dupes_map, store the padded ones (i.e. the ones actually written to the functions' .__code__),
  • Do extra bookkeeping in LineProfiler.add_function() so as to sync the instances, populating their ._c_code_maps,.code_hash_maps, and .dupes_maps with updated hashes and code objects.

The change to .dupes_map will be necessary for the retrieval of "affected" LineProfiler instances. Of course there is always the option of using a new (maybe private) table for that, but since it's already here and is almost what we need...

But this will probably incur some overhead and will need a ton of polishing and optimization to rein it in.

Related issues

Since the function object's .__code__ is subject to change, any given line can be associated with multiple line hashes. There are two solutions:

Footnotes

  1. Disclaimer: I wrote the tool. Just wanted to contextualize what is happening here – hopefully this doesn't sound like a plug. 2

  2. The bug has been resolved in the most recent release, but I encountered it when writing it. See however the attached Example which shows the problem. 2

  3. This peculiarity also means that the bug would not have occurred if both profilers are always applied in lockstep to the same functions.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions