Skip to content

feat: add --tokens N token budget cascade mode#27

Merged
dean0x merged 8 commits intomainfrom
feature/tokens-budget-mode
Mar 14, 2026
Merged

feat: add --tokens N token budget cascade mode#27
dean0x merged 8 commits intomainfrom
feature/tokens-budget-mode

Conversation

@dean0x
Copy link
Owner

@dean0x dean0x commented Mar 14, 2026

Summary

  • Adds --tokens N flag that progressively cascades through transformation modes (full -> minimal -> structure -> signatures -> types) until output fits within N tokens
  • Core library stays pure with dependency-injected token counting (no tiktoken in rskim-core)
  • Cascade orchestration, diagnostics, and caching live in the CLI crate

Changes

Core Library (rskim-core)

  • types.rs: Added Mode::aggressiveness() ordering (Full=0..Types=4) and Mode::cascade_from_here() that returns cascade sequence from any starting mode
  • truncate.rs: Added truncate_to_token_budget() with binary-search line truncation accepting Fn(&str) -> usize closure for token counting (dependency injection)
  • lib.rs/mod.rs: Made truncate module public and re-exported truncate_to_token_budget

CLI (rskim)

  • main.rs: Added --tokens N argument, cascade_for_token_budget() function, wired through all processing paths (single file, stdin, glob, directory)
  • cache.rs: Added token_budget to cache key hash (follows exact max_lines pattern)

Interactions

  • --mode M --tokens N: Cascade starts at mode M, never less aggressive
  • --tokens N --max-lines M: Both constraints applied independently
  • --tokens N --show-stats: Reports correct token counts for final output
  • Per-file budget for multi-file/glob operations
  • Diagnostic messages to stderr only when cascade escalates beyond starting mode
  • Final fallback: line-based truncation of most aggressive mode's output

Test plan

  • 16 new integration tests in cli_token_budget.rs covering:
    • Basic: flag accepted, zero rejected, output within budget invariant
    • Cascade: large budget no cascade, small budget cascades, very small budget line truncation
    • Interaction: explicit mode, max-lines, show-stats, stdin, glob
    • Edge cases: empty file, already within budget, Python/Rust files
  • All existing 151+ core tests continue to pass (345 total tests)
  • cargo clippy -- -D warnings clean
  • cargo fmt -- --check clean
  • Snyk security scan: zero issues
  • No tiktoken dependency added to rskim-core
  • Token budget invariant verified: output tokens <= budget in all test cases

Dean Sharon and others added 2 commits March 14, 2026 02:29
Add a --tokens N flag that progressively cascades through transformation
modes (full -> minimal -> structure -> signatures -> types) until the
output fits within the specified token budget. This is the core "smart
context fitting" feature for AI agents.

Architecture:
- Core library (rskim-core) stays pure: no tiktoken dependency
  - Mode::aggressiveness() and Mode::cascade_from_here() for ordering
  - truncate_to_token_budget() accepts Fn(&str)->usize closure (DI)
- Cascade orchestration lives in CLI crate (rskim)
- Token counting (tiktoken) stays in CLI crate only
- Per-file budget for multi-file/glob operations
- Diagnostic messages to stderr when cascade escalates
- Token budget added to cache key (follows max_lines pattern)

When --mode M and --tokens N are both specified, cascade starts at M.
Final fallback: line-based truncation of the most aggressive mode output.

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract make_marker/fmt_opt closures to DRY repeated format strings
- Unify cascade_for_token_budget and cascade_for_token_budget_stdin into
  a single generic function parameterized by a transform closure
- Rename include_original to compute_token_stats for clarity
- Remove unused ProcessResult::original field
- Replace constructor fns with struct literals

mod parser;
mod transform;
pub mod transform;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module visibility should remain private

The transform module was changed from mod transform (private) to pub mod transform. This exposes the entire module hierarchy as public API, including internal types and implementation details that were not designed for external consumption.

The stated goal is to re-export only truncate_to_token_budget, which is already done via pub use transform::truncate::truncate_to_token_budget on line 41. Making the entire module pub is unnecessary and creates an Interface Segregation Principle violation.

Fix: Revert to private module visibility:

pub use transform::truncate::truncate_to_token_budget;
mod transform;  // keep private

The pub use re-export works even with private modules. Revert transform/mod.rs:11 to pub(crate) mod truncate as well.

best = mid;
lo = mid + 1;
} else {
hi = mid - 1;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary search edge case with usize arithmetic

When mid = 0 and the candidate exceeds budget, the code reaches hi = mid - 1, where hi is usize(0). This creates an underflow risk. While the current code is safe due to the mid == 0 early break on line 377, this depends on a guard that is not immediately obvious to maintainers. A future refactor that removes the guard could introduce a panic.

Fix: Use a half-open interval pattern to avoid usize subtraction:

let mut lo: usize = 0;
let mut hi: usize = lines.len() + 1; // exclusive upper bound

while lo < hi {
    let mid = lo + (hi - lo) / 2;
    // ... build candidate ...
    if count_tokens(&candidate) <= token_budget {
        best = mid;
        lo = mid + 1;
    } else {
        hi = mid; // no subtraction needed
    }
}

This eliminates the subtraction and makes the loop bounds logic clearer.

);
}
}
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertions are too weak for the budget invariant

The core invariant of truncate_to_token_budget is: output tokens ≤ budget. However, the tests skip this assertion for small budgets (1-4 tokens) with a comment saying "the marker itself might exceed."

This is a known limitation, but the test should verify it explicitly by:

  1. Computing the exact marker token count
  2. Only testing budgets large enough to accommodate the marker
  3. Enforcing the invariant strictly rather than using a 10x tolerance

Fix: Strengthen the test:

#[test]
fn test_token_budget_output_invariant() {
    let text = "word1 word2 word3\nword4 word5 word6\nword7 word8 word9\n";
    // Compute the minimum viable budget (marker token count)
    let min_viable_budget = word_count("// ... (3 lines truncated)");
    
    // Test budgets starting from minimum viable
    for budget in min_viable_budget..20 {
        let result = truncate_to_token_budget(text, Language::TypeScript, budget, word_count).unwrap();
        let token_count = word_count(&result);
        assert!(
            token_count <= budget,
            "Budget {}: output has {} tokens, expected ≤ {}",
            budget, token_count, budget
        );
    }
}

continue;
};

let token_count = tokens::count_tokens(&output).unwrap_or(usize::MAX);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent failure on token counting errors

When tokens::count_tokens(&output) fails, the code silently treats it as "infinite tokens" with .unwrap_or(usize::MAX). This means if tiktoken initialization fails (corrupted data, memory issue), every mode in the cascade will appear to exceed the budget, leading to aggressive line truncation without any warning to the user.

The stderr diagnostic "all modes exceeded budget, applying line truncation" would be misleading when the real problem is a tokenizer failure.

Fix: Add a diagnostic warning on first failure:

let token_count = match tokens::count_tokens(&output) {
    Ok(count) => count,
    Err(e) => {
        eprintln!("[skim] warning: token counting failed: {}", e);
        usize::MAX
    }
};

This applies to both the cascade loop and the fallback closure.

@dean0x
Copy link
Owner Author

dean0x commented Mar 14, 2026

Summary: Additional Review Findings (60-79% Confidence)

These items were flagged by reviewers but fall below the 80% confidence threshold for inline comments. They represent architectural considerations and performance optimizations worth discussing.

Performance (Multiple reviewers - 75-80% confidence)

Redundant token counting in cascade + fallback boundary

  • In the cascade loop, token counts are computed for each mode's output, but when the fallback truncate_to_token_budget is invoked, it re-counts the same text on the first fast-path check.
  • Fix: Pass the already-computed token count into the fallback to skip the redundant first count call.

String allocation in binary search

  • The binary search builds candidate strings via lines[..mid].join("\n") on every iteration. For large files this creates many intermediate allocations.
  • Optimization: Precompute byte offsets and use slice references instead of joins (would eliminate O(log N) allocations). This is a micro-optimization not blocking.

Cascade re-parses source with tree-sitter for each mode

  • Each mode in the cascade calls transform_fn, which re-parses the source with tree-sitter. For a full cascade (up to 5 modes), this means 5 separate parse operations.
  • Optimization: Parse tree-sitter AST once and reuse it across mode transformations. Requires exposing a "parse-once, transform-many" API from rskim-core.

Architecture (70-75% confidence)

Cache key stores starting mode, not final cascade mode

  • When cascade escalates (e.g., user requests structure but cascade produces signatures), the cache stores starting_mode not final_mode. The metadata is semantically inconsistent though functionally correct.
  • Consideration: Document this behavior or store actual mode used.

_mode_used return value discarded at call sites

  • The cascade function returns mode_used but both call sites discard it with _mode_used. This information should be stored in ProcessResult for cache and stats reporting.
  • Suggestion: Add mode_used: Mode to ProcessResult to propagate through the system.

Complexity (65% confidence - non-blocking refactoring)

process_file (120 lines) and main (178 lines) exceed complexity thresholds

  • Both functions interleave multiple responsibilities (cache lookup, file I/O, transformation dispatch, token counting, cache writing).
  • Refactoring suggestion: Extract try_cached_result() and run_transform() helpers to bring functions under 50-line threshold. Can be done in follow-up PR.

Testing (60-65% confidence)

Fragile stderr parsing in test_tokens_budget_invariant_with_fixture

  • The test parses stderr to extract token counts. If stats format changes, the assertion silently skips.
  • Fix: Make the assertion unconditional, fail explicitly if parsing fails.

Missing direct unit tests for cascade_for_token_budget function

  • Tested only indirectly through CLI integration tests. Consider adding unit tests for cascade ordering, diagnostics, and fallback behavior.

Recommendation: The 4 inline comments address the highest-severity findings (>80% confidence). These summary items are architectural considerations and optimizations worth addressing in follow-up PRs, not blocking approval.

Dean Sharon and others added 6 commits March 14, 2026 12:44
Batch 1 review fixes:

- B1: Restrict transform module visibility to pub(crate) in lib.rs
  and transform/mod.rs. The public API re-export on lib.rs:41 already
  provides external access to truncate_to_token_budget.

- B2: Use saturating_sub for binary search hi-pointer in
  truncate_to_token_budget to prevent potential usize underflow if
  the mid==0 guard is ever refactored away.

- B3: Enforce token budget invariant for very small budgets. When the
  omission marker alone exceeds the budget, return empty string instead
  of violating the invariant. Updated doc comment with minimum effective
  budget caveat. Strengthened invariant test to assert for ALL budgets.

- S5: Return &'static [Mode] from Mode::cascade_from_here() instead
  of allocating a Vec<Mode> on every call. Updated caller in main.rs.

- S11 (false positive): #[must_use] on truncate_to_token_budget is
  redundant because Result already carries #[must_use]. Clippy rejects
  the double attribute. No change made.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace silent-pass patterns in test_tokens_budget_invariant_with_fixture:
- Use unwrap_or_else(panic!) instead of if-let for stderr parsing
- Add explicit guard for empty token strings
- Parse failures now fail the test instead of silently passing
…st helper

- Binary search: start lo=1 (best=0 is the default), remove mid==0 special
  case and dead post-loop check, plain subtraction is now safe
- Stdin path: reuse report_token_stats helper instead of duplicating pattern
- Tests: extract parse_transformed_token_count helper, use Unicode-aware
  byte offset for arrow character
…mode

S7: 5 unit tests for cascade_for_token_budget (first mode fit, escalation,
    line truncation fallback, single-mode cascade, all-None error)
S8: CLI test for --tokens with --mode=types (single-mode cascade)
S9: Unit test for marker-only output (zero content lines, best=0)
S10: Cache round-trip test with token_budget parameter
- Simplify truncate_to_token_budget: remove redundant best==0 branch
  since lines[..0] is empty and produces the same result via the
  general path
- Fix doc comment on process_file to accurately describe return type
@dean0x dean0x merged commit 9d6ad6b into main Mar 14, 2026
9 checks passed
@dean0x dean0x deleted the feature/tokens-budget-mode branch March 14, 2026 18:46
dean0x added a commit that referenced this pull request Mar 15, 2026
…hurn (#29)

## Summary

Fixes the two performance issues deferred from #27 to tech debt (#28):

- **B4**: `truncate_to_token_budget` now accepts `known_token_count:
Option<usize>`, eliminating the redundant fast-path tokenization when
the cascade loop already computed the count
- **B5**: Binary search replaces per-iteration `lines[..mid].join("\n")`
with a single pre-joined string and byte-offset index, reducing
allocation work from O(N log N) to O(N)

Also adds 3 new unit tests for the `known_token_count` parameter and a
dedicated criterion benchmark group (`token_budget_truncation`) at
100–5000 line scales.

Closes B4 and B5 from #28. Remaining items (B6, B8, S2, S5, S6, S13) are
unchanged.

## Test plan

- [x] All 154 tests pass (`cargo test` — 151 existing + 3 new)
- [x] `cargo clippy -- -D warnings` clean
- [x] `test_token_budget_known_count_none_behaves_like_before` —
property test: `None` and `Some(actual)` produce identical output for
budgets 1..20
- [x] Issue #28 updated with B4/B5 struck through

---------

Co-authored-by: Dean Sharon <deanshrn@gmain.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant