Skip to content

perf: eliminate redundant tokenization and binary search allocation churn#29

Merged
dean0x merged 3 commits intomainfrom
perf/28-token-budget-truncation
Mar 15, 2026
Merged

perf: eliminate redundant tokenization and binary search allocation churn#29
dean0x merged 3 commits intomainfrom
perf/28-token-budget-truncation

Conversation

@dean0x
Copy link
Owner

@dean0x dean0x commented Mar 14, 2026

Summary

Fixes the two performance issues deferred from #27 to tech debt (#28):

  • B4: truncate_to_token_budget now accepts known_token_count: Option<usize>, eliminating the redundant fast-path tokenization when the cascade loop already computed the count
  • B5: Binary search replaces per-iteration lines[..mid].join("\n") with a single pre-joined string and byte-offset index, reducing allocation work from O(N log N) to O(N)

Also adds 3 new unit tests for the known_token_count parameter and a dedicated criterion benchmark group (token_budget_truncation) at 100–5000 line scales.

Closes B4 and B5 from #28. Remaining items (B6, B8, S2, S5, S6, S13) are unchanged.

Test plan

  • All 154 tests pass (cargo test — 151 existing + 3 new)
  • cargo clippy -- -D warnings clean
  • test_token_budget_known_count_none_behaves_like_before — property test: None and Some(actual) produce identical output for budgets 1..20
  • Issue Tech debt: token budget cascade follow-ups #28 updated with B4/B5 struck through

…hurn (#28)

B4: Add `known_token_count` parameter to `truncate_to_token_budget` so
the cascade loop can pass its already-computed count, skipping the
redundant fast-path tokenization of the full text.

B5: Replace per-iteration `lines[..mid].join("\n")` in the binary search
with a single pre-joined string and byte-offset index. Each iteration
now slices from the joined string instead of allocating O(mid) work,
reducing total allocation from O(N log N) to O(N).

Also adds 3 new unit tests for the known_token_count parameter and a
dedicated criterion benchmark group for truncation at 100-5000 lines.
language: Language,
token_budget: usize,
count_tokens: F,
known_token_count: Option<usize>,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Public API signature change breaks backward compatibility

The function truncate_to_token_budget is exported publicly via pub use in lib.rs (line 41). Adding this known_token_count: Option<usize> parameter changes the signature from 4 to 5 arguments, which is a breaking change for external consumers of rskim-core on crates.io.

Recommended fix (choose one):

Option A - Backward-compatible wrapper (preferred):
Keep the 4-param public API and create a new 5-param variant truncate_to_token_budget_with_hint(). Have the 4-param version call the 5-param with None.

Option B - Make it internal:
If external consumers aren't intended, reduce visibility to pub(crate) and remove the pub use from lib.rs:41.

Since rskim-core is published on crates.io, this needs to be addressed before merge.


Reviewed by: Architecture & Regression (HIGH, 90% confidence)

Language::TypeScript,
*budget,
word_count,
None,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Benchmark doesn't test the Some(count) optimization path (B4)

The B4 optimization (skip redundant tokenization) only executes when known_token_count is Some(count), but this benchmark always passes None. This means:

  • Benchmark only measures B5 (pre-join allocation savings), not B4
  • The B4 optimization has no regression guard
  • Future changes that break the Some fast-path won't be caught

Recommended fix: Add a second benchmark variant that passes Some(total) to measure the B4 improvement:

let total = word_count(&text);
// Add new benchmark group exercising Some(total) path:
group.bench_with_input(
    BenchmarkId::new("known/lines", num_lines),
    &(text, budget, total),
    |b, (input, budget, total)| {
        b.iter(|| {
            truncate_to_token_budget(
                black_box(input), Language::TypeScript,
                *budget, word_count, Some(*total),
            ).unwrap()
        })
    },
);

This provides complete regression coverage for both B4 and B5 optimizations.


Reviewed by: Tests & Performance (MEDIUM, 85% confidence)

}

#[test]
fn test_token_budget_known_count_returns_early_when_within_budget() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Test doesn't verify fast-path execution

The test test_token_budget_known_count_returns_early_when_within_budget asserts the return value is correct, but it doesn't verify that count_tokens was actually skipped when known_token_count is Some(count). The test passes word_count as the counter, which would return the same result regardless of whether it was called.

This means if someone broke the B4 optimization (e.g., removed the unwrap_or_else short-circuit), this test would still pass.

Recommended fix: Use a "poison" counter that proves the fast-path was taken:

#[test]
fn test_token_budget_known_count_returns_early_when_within_budget() {
    let text = "line one\nline two\nline three\n";
    
    // Use a poison counter that panics if called on the full text
    let poison_counter = |s: &str| -> usize {
        if s == text {
            panic\!("count_tokens should NOT be called when known_token_count is Some");
        }
        s.split_whitespace().count()
    };
    
    let result = truncate_to_token_budget(
        text, Language::TypeScript, 100, poison_counter, Some(5)
    ).unwrap();
    
    assert_eq\!(result, text, "Fast-path should return text unchanged");
}

If the fast-path is working, the poison counter is never called. If it's broken, the test panics.


Reviewed by: Tests (MEDIUM, 85% confidence)

{
// Fast path: if text already fits, return unchanged
if count_tokens(text) <= token_budget {
// Fast path: if text already fits, return unchanged (B4: skip recount
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Inline ticket references (B4, B5) deviate from codebase style

Comments like "// B4: skip recount" and similar ticket shorthand use codes that won't be meaningful to future contributors. The rest of the codebase uses descriptive prose without ticket codes (e.g., "// Fast path: if text already fits, return unchanged").

This also appears at line 376 ("// B5: Pre-join...") and in test headers ("// known_token_count tests (B4)").

Recommended fix: Replace with descriptive comments that explain the optimization without external context:

Line 353:

// Fast path: when caller provides a known token count from the cascade loop,
// skip redundant full-text tokenization and use the pre-computed count instead.

Line 376:

// Pre-compute line byte-offsets once to avoid allocating O(N log N) intermediate
// strings during binary search. Each iteration slices from the pre-joined string.

Test section header:

// Tests for known_token_count parameter optimization

This keeps the codebase consistent and makes the intent clear without requiring external knowledge of the optimization tickets.


Reviewed by: Consistency (MEDIUM, 80% confidence)

@dean0x
Copy link
Owner Author

dean0x commented Mar 14, 2026

Code Review Summary: PR #29

Branch: perf/28-token-budget-truncation → main
Commit: 8025892 (perf: eliminate redundant tokenization and binary search allocation churn)
Review Date: 2026-03-14


Overview

This PR introduces two targeted performance optimizations to token budget truncation:

  • B4: Skip redundant tokenization via known_token_count parameter
  • B5: Pre-join string + byte-offset index to eliminate O(N log N) allocation churn

Inline Comments Posted: 4 high-confidence blocking issues (≥80% confidence)


Findings by Severity

BLOCKING ISSUES (Address Before Merge)

Severity Category Finding Reviewer(s) Confidence
HIGH Architecture Public API signature change breaks backward compatibility for rskim-core consumers on crates.io. Adding required known_token_count parameter violates semver. Architecture, Regression 90%
MEDIUM Testing Benchmark only tests None path for known_token_count; B4 optimization has no regression guard. Tests, Performance 85%
MEDIUM Testing Test doesn't verify fast-path execution; would pass even if B4 optimization were broken. Tests 85%
MEDIUM Style Inline ticket references (B4, B5) in comments deviate from codebase conventions; use prose instead. Consistency 80%

Action: All 4 issues have detailed inline comments with recommended fixes.


Lower-Confidence Findings (60-79%) → Suggestions

1. last_token_count Initialized to 0 (Rust, Architecture, Security, Performance - MEDIUM)

Multiple reviewers noted that initializing last_token_count to 0 is semantically misleading because 0 is a valid token count (for empty strings). If the guard at line 319 (last_output.is_empty()) is ever removed, Some(0) would bypass truncation.

Current Status: Safe due to upstream guard, but fragile.

Suggestion: Use Option<usize> instead:

let mut last_token_count: Option<usize> = None;
// ... assign Some(token_count) only when a mode produces output

Confidence: 65-75% (Rust reviewer marked as MEDIUM BLOCKING, others as defensive concern)


2. known_token_count Parameter Is Trusted Without Validation (Architecture, Security - MEDIUM)

The function accepts Some(count) without verifying it matches actual token count. If a caller passes Some(5) for text with 100 tokens, the fast-path returns the full text unchecked.

Current Status:

  • Low risk in practice (only internal call site, pre-computed from tokens::count_tokens)
  • But this is a public API on crates.io with no validation
  • Doc comment at lines 331-333 documents the contract; external callers bear responsibility

Suggestion: Add a debug assertion to catch mismatches during development (Architecture review provides code example).

Confidence: 70% (Architecture and Security mark as MEDIUM BLOCKING; defensive improvement)


3. Binary Search Tokenization Cost Dominates (Performance - MEDIUM)

While B5 eliminates allocation churn from join(), the binary search still calls count_tokens(&candidate) at every iteration. For tiktoken (real BPE tokenizer), per-iteration tokenization is O(N) and dominates the O(log N) search cost overall.

Current Status: B5 optimization is still beneficial for word-counting (micro-benchmark), but production path is limited by tokenizer cost.

Suggestion: This is an inherent limitation of binary-search-on-token-count. A more radical future optimization would build cumulative token count array (one-pass tokenization, then prefix-sum queries), reducing from O(N log N) to O(N) tokens.

Confidence: 70% (Acknowledged by Performance reviewer as "not blocked" but future optimization)


4. truncate_to_token_budget Function Length Approaching Threshold (Complexity - MEDIUM)

At ~50 lines (excluding comments), the function is at the upper boundary of acceptable complexity. It handles three distinct concerns: fast-path early return, pre-join index construction, binary search with output assembly.

Suggestion: Extract byte-offset index builder into a small helper function (Complexity review provides code example). This would reduce main function to ~40 lines and make optimization independently testable.

Confidence: 65% (Complexity marks as MEDIUM "Should Fix", not blocking)


Summary Table: All Issues by Category

Category CRITICAL HIGH MEDIUM LOW Total
Blocking (≥80%) 0 1 3 0 4
Should Fix (60-79%) 0 0 4 0 4
Pre-existing 0 0 2 4 6

Review Scores

Reviewer Score Recommendation
Architecture 7/10 CHANGES_REQUESTED
Rust 8/10 APPROVED
Performance 7/10 APPROVED_WITH_CONDITIONS
Tests 7/10 APPROVED_WITH_CONDITIONS
Regression 8/10 APPROVED_WITH_CONDITIONS
Security 9/10 APPROVED_WITH_CONDITIONS
Complexity 8/10 APPROVED
Consistency 8/10 APPROVED_WITH_CONDITIONS

Overall: Strong optimizations with good test coverage. Primary blocker is the public API breaking change, which has two straightforward solutions (backward-compatible wrapper or reduce visibility).


What This PR Does Well

✅ Well-motivated optimizations (skip redundant tokenization, eliminate allocation churn)
✅ All 103+ tests pass; 3 new tests for known_token_count parameter
✅ Property test verifies None and Some(actual) produce identical output
✅ Benchmark added to track B5 optimization (though missing B4 variant)
✅ Commit message accurately describes both optimizations (B4, B5 labels in code)
✅ All in-repo call sites updated correctly
✅ No unsafe Rust; no new dependencies; no security issues


Recommendation

Status: APPROVED_WITH_CHANGES

Conditions:

  1. Address the 4 blocking inline comments before merge
  2. Optionally implement 4 "should fix" suggestions (token counting guards, complexity reduction, etc.)

The optimizations are sound and well-tested. The blocking issues are straightforward to address and do not indicate deeper problems with the implementation.


Review consolidation by Claude Code | Deduplicated 8 review reports | 4 inline comments + summary

Dean Sharon added 2 commits March 14, 2026 23:18
- Change last_token_count from usize to Option<usize> for type safety (#6)
- Add debug_assert! to validate known_token_count matches actual (#2)
- Replace B4/B5 ticket shorthand with descriptive comments (#5)
- Strengthen fast-path test with call-counting poison counter (#3)
- Add lines_known benchmark variant exercising Some(total) path (#4)
@dean0x dean0x merged commit 5031fc9 into main Mar 15, 2026
5 checks passed
@dean0x dean0x deleted the perf/28-token-budget-truncation branch March 15, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant