gh-132380: Use unicode hash/compare for the mcache.#133669
gh-132380: Use unicode hash/compare for the mcache.#133669nascheme wants to merge 5 commits intopython:mainfrom
Conversation
This allows cache to work with non-interned strings.
|
(just adding the skip news label so that you don't get pinged by the bot everytime you push, saying "failed checks") |
Ensure we don't read the cache in the case the 'name' argument is a non-exact unicode string. It's possible to have an overridden `__eq__` method and it using `_PyUnicode_Equal()` in that case would be wrong.
|
I cherry-picked this PR onto the 3.14 branch and built CPython and LibCST. I had to combine Instagram/LibCST#1324 and Instagram/LibCST#1295 and also back out the Python-level caching I added in Instagram/LibCST#1295, since that's unnecessary with this PR. I see substantially improved multithreaded scaling, although there's still some contention. Looking at the profiles, it seems like the contention is coming from GC pauses? Here's the profile I record on 3.14b1 and the 3.14 branch with this PR applied, respectively: https://share.firefox.dev/43bHvhH https://share.firefox.dev/3GM0Xdu Here's the profile using multiprocessing: |
|
(this is on a mac so I can't easily get python-level profiles and line numbers in LibCST's Python code) |
|
Thanks for the testing. I tested LibCST on Linux and also see a performance improvement. Running the following command in the numpy source folder:
I get the elapsed run times: base 3.13, getattr(): 30.02 sec |
|
@colesbury This uses unicode string hash/compare instead of using the string pointer value. Unlike your suggestion, this doesn't use a separate lookup loop for the non-interned case, it just always uses the string hash/compare. pyperformance results show a small slowdown (0.4 %?), I was expecting worse since the non-interned case is so uncommon. I can try a separate loop if you think that's worth pursing. The advantage of this approach is that it's fairly simple code-wise and I think would be a candidate to backport to 3.13 and 3.14. Perhaps for 3.15 we should try a per-thread cache. |
Handle common cases early.
colesbury
left a comment
There was a problem hiding this comment.
I don't think we should do it this way. I don't think it's worth suffering even a small performance penalty for a rare case (non-interned lookup keys), when we can support that without any performance hit.
| @@ -0,0 +1,2 @@ | |||
| For free-threaded build, allow non-interned strings to be cached in the type | |||
There was a problem hiding this comment.
I don't think this caches non-interned strings. It seems to me that it allows non-interned strings as the lookup key, but the cache still only contains interned strings.
|
Closing this since I agree with Sam that the performance hit is too much to pay for improving such a rare case. |
This allows the type lookup cache to work with non-interned strings.
pyperformance results
_PyType_LookupRef#132380