BUG: dupe-code resolution on multiple profilers results in loss of profiling data

Summary
----
Since profilers distinguish between functions compiling down to the same bytecode by NOP padding and then storing the line/code-block hashes, if the code object is later modified (say by another profiler (see #347)), the earlier profiler risks losing access to profiling data from the function.

Motivation
----
I've been playing around with [`pytest-autoprofile`](https://gitlab.com/TTsangSC/pytest-autoprofile),[^1] trying to see if I can use it to profile the `line_profiler` test suite itself, and hopefully identify pinch points in the code. It mostly worked (as in "not messing the test results up"), with the notable exception of causing tests using `kernprof.ContextualProfiler` to fail – which was however expected because `cProfile` and `line_profiler` would be competing for the lock on `sys.monitoring` (see #327).

Still, one other test also mysteriously failed:[^2] [`tests/test_explict_profile.py::test_explicit_proifler_with_in_code_enable()`](tests/test_explict_profile.py), which explicitly used `profile.enable()` and `.disable()` to toggle which functions the inner `LineProfiler` of `@line_profiler.profile` actually sees. Digging deeper revealed that the bug only manifests when my "test profiler" (which sees more objects than `@line_profiler.profile`) decorates a callable decorated by the latter, but not the other way round.

Example
----
Write the following file (`bug_demo.py`):

<details>
<summary> Click to expand </summary>

```python
from io import StringIO
from operator import attrgetter
from textwrap import indent

from line_profiler import LineProfiler
from line_profiler._line_profiler import NOP_BYTES


prof_all = LineProfiler()
prof_some = LineProfiler()


# Now we create 4 duplicate functions which compile down to the same
# bytecode (the constant is compiled away) and set up different
# profiling for them


@prof_all  # No dupes -> original bytecode preserved
def func1():
    for _ in range(1):
        yield 1


@prof_all  # 1st dupe -> bytecode padded with 2 NOPs
@prof_some  # No dupes -> original bytecode preserved
def func2():
    for _ in range(2):
        yield 2


@prof_all  # 2nd dupe -> bytecode padded with 3 NOPs
def func3():
    for _ in range(3):
        yield 3


@prof_all  # Since the bytecode is padded, it doesn't count as a dupe
@prof_some  # 1st dupe -> bytecode padded with 2 NOPs
def func4():
    for _ in range(4):
        yield 4


# Check that our intuition about the bytecode padding is correct

get_raw_bytecode = attrgetter('__line_profiler_id__.func.__code__.co_code')
assert not get_raw_bytecode(func1).endswith(NOP_BYTES)
assert get_raw_bytecode(func2).endswith(NOP_BYTES * 2)
assert get_raw_bytecode(func3).endswith(NOP_BYTES * 3)
assert get_raw_bytecode(func4).endswith(NOP_BYTES * 2)


# Call each of them once and show the results

assert list(func1()) == [1] * 1
assert list(func2()) == [2] * 2
assert list(func3()) == [3] * 3
assert list(func4()) == [4] * 4

with StringIO() as sio:  # func1, func2, func3, func4
    prof_all.print_stats(sio)
    print('@prof_all:', indent(sio.getvalue(), '  '), sep='\n\n')

with StringIO() as sio:
    # Expected: func2, func4 / Actual: only func4 has profiling data
    prof_some.print_stats(sio)
    print('@prof_some:', indent(sio.getvalue(), '  '), sep='\n\n')
```

</details>

Output of running it:

<details>
<summary> Click to expand </summary>

```shell
(3.13-multi-desync-fix)  $ python bug_demo.py 
@prof_all:

  Timer unit: 1e-09 s

  Total time: 2e-06 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func1 at line 18

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      18                                           @prof_all  # No dupes -> original bytecode preserved
      19                                           def func1():
      20         2       1000.0    500.0     50.0      for _ in range(1):
      21         1       1000.0   1000.0     50.0          yield 1

  Total time: 0 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func2 at line 24

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      24                                           @prof_all  # 1st dupe -> bytecode padded with 2 NOPs
      25                                           @prof_some  # No dupes -> original bytecode preserved
      26                                           def func2():
      27         3          0.0      0.0               for _ in range(2):
      28         2          0.0      0.0                   yield 2

  Total time: 0 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func3 at line 31

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      31                                           @prof_all  # 2nd dupe -> bytecode padded with 3 NOPs
      32                                           def func3():
      33         4          0.0      0.0               for _ in range(3):
      34         3          0.0      0.0                   yield 3

  Total time: 1e-06 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func4 at line 37

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      37                                           @prof_all  # Since the bytecode is padded, it doesn't count as a dupe
      38                                           @prof_some  # 1st dupe -> bytecode padded with 2 NOPs
      39                                           def func4():
      40         5          0.0      0.0      0.0      for _ in range(4):
      41         4       1000.0    250.0    100.0          yield 4


@prof_some:

  Timer unit: 1e-09 s

  Total time: 0 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func2 at line 24

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      24                                           @prof_all  # 1st dupe -> bytecode padded with 2 NOPs
      25                                           @prof_some  # No dupes -> original bytecode preserved
      26                                           def func2():
      27                                               for _ in range(2):
      28                                                   yield 2

  Total time: 3e-06 s
  File: /Users/terence/python_env/line_profiler_dev/multi-desync-fix/bug_demo.py
  Function: func4 at line 37

  Line #      Hits         Time  Per Hit   % Time  Line Contents
  ==============================================================
      37                                           @prof_all  # Since the bytecode is padded, it doesn't count as a dupe
      38                                           @prof_some  # 1st dupe -> bytecode padded with 2 NOPs
      39                                           def func4():
      40         5          0.0      0.0      0.0      for _ in range(4):
      41         4       3000.0    750.0    100.0          yield 4


```

</details>

As seen in the profiling output, while `@prof_some` has seen `func2()`, the former did not receive any profiling data when the latter is run. Apropos, the discrepancies in the numbers that `@prof_some` and `@prof_all` report for `func4()` also shows that the current way #347 is implemented (using a Python-level `set` of `LineProfiler` objects) may have more overhead than is expected/acceptable.

Analysis
----
As noted in the comments in `bug_demo.py`, the crux of the issue is that:
- `func1()` and `func2()` compile down to the same bytecode
- `func1()` is decorated by `@prof_all`, which then associates the original bytecode with `func1()`
- `func2()` is decorated by `@prof_some`, which then associates the original bytecode with `func2()`
- It is however subsequently decorated by `@prof_all`, which sees a duplicate, and thus replaces `func2.__code__` with a padded version
- Since the padded bytecode (which `@prof_some` doesn't recognize) is the one that ends up being executed, only `@prof_all` receives profiling data when `func2()` is run, but not `@prof_some`.

Partial mitigation strats – and why they don't *really* work
----

### Change the order of application of profiler-decorators

This is what I ended up doing for `pytest-autoprofile`,[^1][^2] where I can count on having relatively clean environments (since it's a `pytest` plugin and is meant to augment test sessions) and essentially have perfect control over `LineProfiler` instances, imports, and more. Since I know for a fact that my "test profiler" either:
- Don't interfere with `@line_profiler.profile`,
- Profiles whatever `@line_profiler.profile._profile` sees, or
- Profiles whatever `@line_profiler.profile` sees (i.e. it profiles functions skipped by `@line_profiler.profile._profile` when `@line_profiler.profile` is not `.enabled`),

I can be sure that when the "test profiler" and `@line_profiler.profile._profile` needs to interact, the former always sees more functions than the latter, and thus double-decoration is safe *as long as the former is applied before the latter*. This corresponds to the `func4()` case in the [Example](#example), which both profiler successfully profile – because the one applied as the inner decorator has already padded the bytecode, and said padded bytecode has no duplicates as far as the outer-decorator profiler is concerned.[^3]

While this workaround works for this limited case, for more general applications trying to juggle multiple `LineProfiler` instances around, one is bound to run into problematic behaviors as shown in the [Example](#example). Such bugs are particularly nasty as they are subtle: only noticeable from the output, and if one already knows the expected result and what to look for.

### Use only one "real" `LineProfiler` and make the others dummy objects

When writing #347 I briefly toyed around with [the idea](https://github.com/pyutils/line_profiler/pull/347/commits/e543ffe0a413a0f5ef3362af152d6a195b3bf251) of only keeping one `line_profiler._line_profiler.LineProfiler` object (i.e. the C(-ython)-level profiler): instead of being a subclass thereof, `line_profiler.line_profiler.LineProfiler` (i.e. the Python-level profiler) would become a wrapper class, and each instance would essentially "subscribe to" profiling data of individual functions, only retrieving and filtering profiling data from the C(-ython)-level singleton profiler when requested.

This approach would've had the advantage of reducing overhead (since the trace callback wouldn't have to loop over `LineProfiler` instances), but the drawback was glaring: since data was only gathered at one source and later post-processed, the Python-level profilers couldn't be individually enabled and disabled for line events, and thus different wrappers couldn't be used to profile separate calls:

<details>

<summary> Click to expand </summary>

```python
from line_profiler import LineProfiler
def func():
    ...

prof1 = LineProfiler()
prof2 = LineProfiler()

wrapper1 = prof1(func)
wrapper2 = prof2(func)

wrapper1()
wrapper2()

# Both profilers would report 2 calls,
# since there's only one underlying Cython `LineProfiler`
# which both take data from
prof1.print_stats()
prof2.print_stats()
```

</details>

Hence the quick abandonment of this approach and the complete rewriting of the PR.

Possible fix
----
No concrete ideas as of now – otherwise I would've written a PR directly.

One possible angle of attack may be to slightly alter the semantics of `LineProfiler.dupes_map: dict[bytes, list[CodeType]]  # key: unpadded code.co_code, value items: unpadded code object`, which isn't really used other than for its length (so it might as well be replaced with a `dict[bytes, int]`). Say, if we were to:
- Keep track of all `LineProfiler` instances *somewhere*,
- Instead of storing the *unpadded* code objects in `.dupes_map`, store the *padded* ones (i.e. the ones actually written to the functions' `.__code__`),
- Do extra bookkeeping in `LineProfiler.add_function()` so as to sync the instances, populating their `._c_code_map`s,`.code_hash_map`s, and `.dupes_map`s with updated hashes and code objects.

The change to `.dupes_map` will be necessary for the retrieval of "affected" `LineProfiler` instances. Of course there is always the option of using a new (maybe private) table for that, but since it's already here and is almost what we need...

But this will probably incur some overhead and will need a ton of polishing and optimization to rein it in.

Related issues
----
Since the function object's `.__code__` is subject to change, any given line can be associated with multiple line hashes. There are two solutions:
- Keep all versions of the code object in `.code_hash_map`, and combine the profiling data associated therewith in `LineProfiler.get_stats()`. This overlaps with issue #348 and will require the fix in #349 (or similar).
- In `LineProfiler.add_function()`, scrub replaced code objects from `.code_hash_map`, and manually migrate the profiling data (`._c_code_map`, `._c_last_time`) associated with its old line hashes over to the new line hashes.

[^1]: Disclaimer: I wrote the tool. Just wanted to contextualize what is happening here – hopefully this doesn't sound like a plug.
[^2]: The bug has been resolved in the most recent release, but I encountered it when writing it. See however the attached [Example](#example) which shows the problem.
[^3]: This peculiarity also means that the bug would not have occurred if both profilers are always applied in lockstep to the same functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: dupe-code resolution on multiple profilers results in loss of profiling data #350

Summary

Motivation

Example

Analysis

Partial mitigation strats – and why they don't really work

Change the order of application of profiler-decorators

Use only one "real" `LineProfiler` and make the others dummy objects

Possible fix

Related issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: dupe-code resolution on multiple profilers results in loss of profiling data #350

Description

Summary

Motivation

Example

Analysis

Partial mitigation strats – and why they don't really work

Change the order of application of profiler-decorators

Use only one "real" LineProfiler and make the others dummy objects

Possible fix

Related issues

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Use only one "real" `LineProfiler` and make the others dummy objects