gh-132380: Use unicode hash/compare for the mcache. by nascheme · Pull Request #133669 · python/cpython

nascheme · 2025-05-08T13:55:10Z

This allows the type lookup cache to work with non-interned strings.

Issue: Lock contention inside _PyType_LookupRef #132380

This allows cache to work with non-interned strings.

picnixz · 2025-05-08T13:57:00Z

(just adding the skip news label so that you don't get pinged by the bot everytime you push, saying "failed checks")

Ensure we don't read the cache in the case the 'name' argument is a non-exact unicode string. It's possible to have an overridden `__eq__` method and it using `_PyUnicode_Equal()` in that case would be wrong.

ngoldbaum · 2025-05-08T22:27:57Z

I cherry-picked this PR onto the 3.14 branch and built CPython and LibCST. I had to combine Instagram/LibCST#1324 and Instagram/LibCST#1295 and also back out the Python-level caching I added in Instagram/LibCST#1295, since that's unnecessary with this PR.

I see substantially improved multithreaded scaling, although there's still some contention. Looking at the profiles, it seems like the contention is coming from GC pauses?

Here's the profile I record on 3.14b1 and the 3.14 branch with this PR applied, respectively:

https://share.firefox.dev/43bHvhH

https://share.firefox.dev/3GM0Xdu

Here's the profile using multiprocessing:

https://share.firefox.dev/42QR32I

ngoldbaum · 2025-05-08T22:29:10Z

(this is on a mac so I can't easily get python-level profiles and line numbers in LibCST's Python code)

nascheme · 2025-05-08T23:14:48Z

Thanks for the testing. I tested LibCST on Linux and also see a performance improvement. Running the following command in the numpy source folder:

python3 -m libcst.tool codemod --no-format strip_strings_from_types.StripStringsCommand numpy/_core

I get the elapsed run times:

base 3.13, getattr(): 30.02 sec
base 3.13, type_lookup(): 20.94 sec
3.13 + this PR, getattr(): 18.48 sec

nascheme · 2025-05-09T19:27:27Z

@colesbury This uses unicode string hash/compare instead of using the string pointer value. Unlike your suggestion, this doesn't use a separate lookup loop for the non-interned case, it just always uses the string hash/compare. pyperformance results show a small slowdown (0.4 %?), I was expecting worse since the non-interned case is so uncommon.

I can try a separate loop if you think that's worth pursing. The advantage of this approach is that it's fairly simple code-wise and I think would be a candidate to backport to 3.13 and 3.14. Perhaps for 3.15 we should try a per-thread cache.

Handle common cases early.

colesbury

I don't think we should do it this way. I don't think it's worth suffering even a small performance penalty for a rare case (non-interned lookup keys), when we can support that without any performance hit.

colesbury · 2025-05-12T16:44:19Z

Misc/NEWS.d/next/Core_and_Builtins/2025-05-08-11-22-57.gh-issue-132380._9vB7H.rst

@@ -0,0 +1,2 @@
+For free-threaded build, allow non-interned strings to be cached in the type


I don't think this caches non-interned strings. It seems to me that it allows non-interned strings as the lookup key, but the cache still only contains interned strings.

nascheme · 2025-06-04T15:46:18Z

Closing this since I agree with Sam that the performance hit is too much to pay for improving such a rare case.

Use unicode hash/compare for mcache.

b6bca55

This allows cache to work with non-interned strings.

nascheme added performance Performance or resource usage topic-free-threading labels May 8, 2025

bedevere-app bot mentioned this pull request May 8, 2025

Lock contention inside _PyType_LookupRef #132380

Open

picnixz added the skip news label May 8, 2025

nascheme added 3 commits May 8, 2025 11:23

Add NEWS.

3ebc1e8

Merge 'origin/main' into mcache-str-hash

9c4e7c4

Only look in cache for exact unicode strings.

8335fa1

Ensure we don't read the cache in the case the 'name' argument is a non-exact unicode string. It's possible to have an overridden `__eq__` method and it using `_PyUnicode_Equal()` in that case would be wrong.

nascheme removed the skip news label May 8, 2025

nascheme marked this pull request as ready for review May 9, 2025 19:11

nascheme requested a review from markshannon as a code owner May 9, 2025 19:11

bedevere-app bot added the awaiting core review label May 9, 2025

nascheme requested a review from colesbury May 9, 2025 19:12

nascheme mentioned this pull request May 9, 2025

GH-132380: Add optimization for non-interned type lookup. #132652

Closed

Optimize mcache_name_eq().

7723ceb

Handle common cases early.

colesbury reviewed May 12, 2025

View reviewed changes

ngoldbaum mentioned this pull request May 22, 2025

Reduce contention on free-threaded build Instagram/LibCST#1325

Open

nascheme closed this Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-132380: Use unicode hash/compare for the mcache.#133669

gh-132380: Use unicode hash/compare for the mcache.#133669
nascheme wants to merge 5 commits intopython:mainfrom
nascheme:mcache-str-hash

nascheme commented May 8, 2025 •

edited

Loading

Uh oh!

picnixz commented May 8, 2025

Uh oh!

ngoldbaum commented May 8, 2025 •

edited

Loading

Uh oh!

ngoldbaum commented May 8, 2025

Uh oh!

nascheme commented May 8, 2025

Uh oh!

nascheme commented May 9, 2025

Uh oh!

colesbury left a comment

Uh oh!

colesbury May 12, 2025

Uh oh!

nascheme commented Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,2 @@
		For free-threaded build, allow non-interned strings to be cached in the type

Uh oh!

Conversation

nascheme commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented May 8, 2025

Uh oh!

ngoldbaum commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented May 8, 2025

Uh oh!

nascheme commented May 8, 2025

Uh oh!

nascheme commented May 9, 2025

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

colesbury May 12, 2025

Choose a reason for hiding this comment

Uh oh!

nascheme commented Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nascheme commented May 8, 2025 •

edited

Loading

ngoldbaum commented May 8, 2025 •

edited

Loading