Skip to content

Index node finder#64

Draft
15r10nk wants to merge 12 commits intoalexmojaki:masterfrom
15r10nk:index_node_finder
Draft

Index node finder#64
15r10nk wants to merge 12 commits intoalexmojaki:masterfrom
15r10nk:index_node_finder

Conversation

@15r10nk
Copy link
Collaborator

@15r10nk 15r10nk commented Dec 18, 2022

This is still work in progress.

The goal is an NodeFinder with similar capabilities like PositionNodeFinder, but for cpython <= 3.10.

see #42

todo:

@15r10nk 15r10nk force-pushed the index_node_finder branch 5 times, most recently from f148326 to 554504f Compare December 20, 2022 20:45
@15r10nk 15r10nk force-pushed the index_node_finder branch from 554504f to 57d8fb0 Compare January 3, 2023 20:42
@15r10nk 15r10nk self-assigned this Jan 26, 2023
@15r10nk 15r10nk force-pushed the index_node_finder branch 2 times, most recently from 28a88ea to 637521e Compare January 26, 2023 21:42
@15r10nk 15r10nk force-pushed the index_node_finder branch 2 times, most recently from 8ad604b to 0a22823 Compare February 14, 2023 22:42
@15r10nk
Copy link
Collaborator Author

15r10nk commented Feb 19, 2023

hi @alexmojaki. I have some good news for you. The new NodeFinder is near completion and works so far, but I need some more time to finish and to optimize it.

Here you can get some impression what it can do:

https://github.com/15r10nk/executing/blob/index_node_finder/Features.md

This list gets generated by pytest and probably still misses some features.

@15r10nk 15r10nk force-pushed the index_node_finder branch 10 times, most recently from af7ef00 to 65619b9 Compare February 27, 2023 20:54
@15r10nk 15r10nk force-pushed the index_node_finder branch 5 times, most recently from 335d5c3 to e5d175c Compare March 18, 2023 23:14
@15r10nk 15r10nk force-pushed the index_node_finder branch from 4a69438 to 20cef7d Compare March 29, 2023 17:35
@15r10nk 15r10nk force-pushed the index_node_finder branch from 20cef7d to d16de24 Compare March 31, 2023 20:38
@alexmojaki
Copy link
Owner

We're going to use executing in https://github.com/pydantic/logfire (btw you can see inline-snapshot being used there). But there are cases where logfire rewrites the AST and this prevents executing from working in Python<3.11. If this branch works better in the presence of pytest then I expect it to work better in our case as well.

It sounded like this was going well, can it be revived?

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

cool that inline-snapshot is used in logfire 😄

If this branch works better in the presence of pytest then I expect it to work better in our case as well.

kind of. This branch works not out of the box with pytest. I had to reapply the pytest changes on top of the original ast to make the mapping work. We would have to do the same with logfire / any other library which modifies the ast (like lazy-imports-lite for example).

It sounded like this was going well, can it be revived?

Yes, the concept worked, but I would merge it with the current main and add some comments to make it easier to understand before you review it.

I have also a general question to ast-rewriting with a custom MetaPathFinder.
I see the concept in several libraries (pytest/logfire/lazy-import-lite/ and probably many others). Is there any concept which would allow to combine multiple ast transformations? Like a library which installs one MetaPathFinder and calls multiple "plugins" which transform the ast before it gets converted into bytecode?

The auto_trace feature of logfire can currently not be used in combination with assertion rewriting of pytest for example. And I'm also not sure which rewriting would be used.

@alexmojaki
Copy link
Owner

kind of. This branch works not out of the box with pytest. I had to reapply the pytest changes on top of the original ast to make the mapping work.

Wouldn't that work equally well with the other node finders? It sounds like this branch doesn't actually help with pytest.

Would this branch work better when the statement being executed hasn't been modified, i.e. all the modifications are 'around' the executed statement?

I have also a general question to ast-rewriting with a custom MetaPathFinder.
I see the concept in several libraries (pytest/logfire/lazy-import-lite/ and probably many others). Is there any concept which would allow to combine multiple ast transformations? Like a library which installs one MetaPathFinder and calls multiple "plugins" which transform the ast before it gets converted into bytecode?

I have also thought such a thing would be nice. I imagine it doesn't exist because AST rewriting isn't common enough that people are bothered by them not cooperating.

Another library that does this is birdseye.

I could make executing work with auto-tracing if I did the AST transformation on the tree of an executing.Source. The problem is that @logfire.instrument also uses AST rewriting which depends on arguments, meaning that there is no longer a single Source associated with a file.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

Wouldn't that work equally well with the other node finders? It sounds like this branch doesn't actually help with pytest.

This branch has almost the same functionality like the PositionNodeFinder.
But I had to introduce a dependency to pytest to make it work.

https://github.com/15r10nk/executing/blob/6e2d947ae80960f5efc432bef358bf7b70526157/executing/_index_node_finder.py#L500

We would have to add something similar for logfire.

The problem is that @logfire.instrument also uses AST rewriting which depends on arguments, meaning that there is no longer a single Source associated with a file.

I hope that I'm correct here, but this should not be a problem. logfire could memorize every ast it has created for a source file and we could check each of them for a matching bytecode.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

Would this branch work better when the statement being executed hasn't been modified, i.e. all the modifications are 'around' the executed statement?

A modification at root level messes with the bytecode structure and the algo can not map the bytecodes (and child bytecodes) any more.
It could handle changes inside functions and be able to map expressions outside the changed functions, but pytest adds imports at the root level.

@alexmojaki
Copy link
Owner

Ah, after rereading #31 (comment) I see that this method requires the bytecode to match the source code exactly, my mistake for thinking otherwise.

I could store a Source for each modified code object produced by instrument, but at that point it's getting messy. I was hoping that there was a better solution here.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

I could store a Source for each modified code object produced by instrument, but at that point it's getting messy. I was hoping that there was a better solution here.

It might be enough to store the args. You should be able to generate the Source with this args and the original ast.

I was thinking about something like this:

@executing.register_ast_provider
def get_logfire_ast(path,original_ast):
   for args in all_args_for_file(path):
       yield transform(original_ast,args)

Could this also be used to make the SentinelNodeFinder work?

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

Can you tell me why you need the args to generate the code?
Are there cases where you have to generate different code for different args?

I read some source of logfire but I did not understood why instrument should generate different code for different args.

@alexmojaki
Copy link
Owner

Are there cases where you have to generate different code for different args?

https://github.com/pydantic/logfire/blob/8ce1cfe31208e5dc87f35221ef0d7eb2401348e4/logfire/_internal/instrument.py#L129

It might be possible to always generate the same code every time, but probably with some small performance cost.

I can consider various approaches to solving this problem if I get to it. For now I'm just going to only fully support this feature in 3.11+.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

It is just an idea and I probably don't know enough about logfire, but you could generate something like this:

@logfire.instrument
def some_function(arg_a,arg_b,__logfire_args__):
    if __logfire_args__.extract_args:
        __logfire_args__.method({"arg_a":arg_a,"arg_b":arg_b})
    ...

The instrument decorator should pass the __logfire_args__ at runtime.

This would also save some time if you have multiple instrumented functions, because you would transform the source only once.

But you have probably similar ideas.

It is ok for me if this branch does not get merged.
The new algo is nice but quiet complex. We can keep the current implementation if we don't need more features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants