Skip to content

Optimize StringIntegerMap construction: counting sort + skip debug-only duplicate checks#172

Merged
meta-codesync[bot] merged 1 commit intometa-pytorch:mainfrom
redmercury:export-D92973393
Feb 12, 2026
Merged

Optimize StringIntegerMap construction: counting sort + skip debug-only duplicate checks#172
meta-codesync[bot] merged 1 commit intometa-pytorch:mainfrom
redmercury:export-D92973393

Conversation

@redmercury
Copy link
Contributor

Summary:
Replace O(n log n) std::sort with O(n) counting sort for bucket distribution,
skip duplicate-detection sorts in release builds, and replace unordered_map
cross-reference with direct index array.

Simpleperf profiling of assistant startup on Supernova (hammerhead) showed
tokenizers::Tiktoken::load() consuming 14% of all CPU cycles, dominated by
StringIntegerMap construction sorting and hashing ~200K token entries.

Construction benchmark (opt build, x86_64):

  • Baseline: 81.3 ms -> Optimized: 25.2 ms (3.2x faster)

On-device boot metrics (50 runs each, Supernova/hammerhead):

  • Mean assistant ready time: 15.625s -> 15.341s (-285ms)
  • P50 assistant ready time: 15.810s -> 15.369s (-441ms)
  • Welch t-test: t=-2.587, p<0.05, 95% CI [-500, -69]ms
  • Boots completing under 15s: 20% -> 48%

Reviewed By: larryliu0820

Differential Revision: D92973393

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 11, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 11, 2026

@redmercury has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92973393.

Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@redmercury redmercury force-pushed the export-D92973393 branch 2 times, most recently from af24532 to 6a70413 Compare February 12, 2026 10:06
…ly duplicate checks (meta-pytorch#172)

Summary:

Replace O(n log n) std::sort with O(n) counting sort for bucket distribution,
skip duplicate-detection sorts in release builds, and replace unordered_map
cross-reference with direct index array.

Simpleperf profiling of assistant startup on Supernova (hammerhead) showed
tokenizers::Tiktoken::load() consuming 14% of all CPU cycles, dominated by
StringIntegerMap construction sorting and hashing ~200K token entries.

Construction benchmark (opt build, x86_64):
- Baseline: 81.3 ms -> Optimized: 25.2 ms (3.2x faster)

On-device boot metrics (50 runs each, Supernova/hammerhead):
- Mean assistant ready time: 15.625s -> 15.341s (-285ms)
- P50 assistant ready time: 15.810s -> 15.369s (-441ms)
- Welch t-test: t=-2.587, p<0.05, 95% CI [-500, -69]ms
- Boots completing under 15s: 20% -> 48%

Reviewed By: larryliu0820

Differential Revision: D92973393
@meta-codesync meta-codesync bot merged commit 1c43247 into meta-pytorch:main Feb 12, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants