Optimize StringIntegerMap construction: counting sort + skip debug-only duplicate checks by redmercury · Pull Request #172 · meta-pytorch/tokenizers

redmercury · 2026-02-11T19:40:49Z

Summary:
Replace O(n log n) std::sort with O(n) counting sort for bucket distribution,
skip duplicate-detection sorts in release builds, and replace unordered_map
cross-reference with direct index array.

Simpleperf profiling of assistant startup on Supernova (hammerhead) showed
tokenizers::Tiktoken::load() consuming 14% of all CPU cycles, dominated by
StringIntegerMap construction sorting and hashing ~200K token entries.

Construction benchmark (opt build, x86_64):

Baseline: 81.3 ms -> Optimized: 25.2 ms (3.2x faster)

On-device boot metrics (50 runs each, Supernova/hammerhead):

Mean assistant ready time: 15.625s -> 15.341s (-285ms)
P50 assistant ready time: 15.810s -> 15.369s (-441ms)
Welch t-test: t=-2.587, p<0.05, 95% CI [-500, -69]ms
Boots completing under 15s: 20% -> 48%

Reviewed By: larryliu0820

Differential Revision: D92973393

meta-codesync · 2026-02-11T19:41:03Z

@redmercury has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92973393.

larryliu0820

Review automatically exported from Phabricator review in Meta.

…ly duplicate checks (meta-pytorch#172) Summary: Replace O(n log n) std::sort with O(n) counting sort for bucket distribution, skip duplicate-detection sorts in release builds, and replace unordered_map cross-reference with direct index array. Simpleperf profiling of assistant startup on Supernova (hammerhead) showed tokenizers::Tiktoken::load() consuming 14% of all CPU cycles, dominated by StringIntegerMap construction sorting and hashing ~200K token entries. Construction benchmark (opt build, x86_64): - Baseline: 81.3 ms -> Optimized: 25.2 ms (3.2x faster) On-device boot metrics (50 runs each, Supernova/hammerhead): - Mean assistant ready time: 15.625s -> 15.341s (-285ms) - P50 assistant ready time: 15.810s -> 15.369s (-441ms) - Welch t-test: t=-2.587, p<0.05, 95% CI [-500, -69]ms - Boots completing under 15s: 20% -> 48% Reviewed By: larryliu0820 Differential Revision: D92973393

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 11, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 11, 2026

larryliu0820 approved these changes Feb 11, 2026

View reviewed changes

redmercury force-pushed the export-D92973393 branch 2 times, most recently from af24532 to 6a70413 Compare February 12, 2026 10:06

redmercury force-pushed the export-D92973393 branch from 6a70413 to 655d6e9 Compare February 12, 2026 10:41

meta-codesync bot merged commit 1c43247 into meta-pytorch:main Feb 12, 2026
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize StringIntegerMap construction: counting sort + skip debug-only duplicate checks#172

Optimize StringIntegerMap construction: counting sort + skip debug-only duplicate checks#172
meta-codesync[bot] merged 1 commit intometa-pytorch:mainfrom
redmercury:export-D92973393

redmercury commented Feb 11, 2026

Uh oh!

meta-codesync bot commented Feb 11, 2026

Uh oh!

larryliu0820 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

redmercury commented Feb 11, 2026

Uh oh!

meta-codesync bot commented Feb 11, 2026

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants