feat: add --tokens N token budget cascade mode by dean0x · Pull Request #27 · dean0x/skim

dean0x · 2026-03-14T00:29:42Z

Summary

Adds --tokens N flag that progressively cascades through transformation modes (full -> minimal -> structure -> signatures -> types) until output fits within N tokens
Core library stays pure with dependency-injected token counting (no tiktoken in rskim-core)
Cascade orchestration, diagnostics, and caching live in the CLI crate

Changes

Core Library (rskim-core)

types.rs: Added Mode::aggressiveness() ordering (Full=0..Types=4) and Mode::cascade_from_here() that returns cascade sequence from any starting mode
truncate.rs: Added truncate_to_token_budget() with binary-search line truncation accepting Fn(&str) -> usize closure for token counting (dependency injection)
lib.rs/mod.rs: Made truncate module public and re-exported truncate_to_token_budget

CLI (rskim)

main.rs: Added --tokens N argument, cascade_for_token_budget() function, wired through all processing paths (single file, stdin, glob, directory)
cache.rs: Added token_budget to cache key hash (follows exact max_lines pattern)

Interactions

--mode M --tokens N: Cascade starts at mode M, never less aggressive
--tokens N --max-lines M: Both constraints applied independently
--tokens N --show-stats: Reports correct token counts for final output
Per-file budget for multi-file/glob operations
Diagnostic messages to stderr only when cascade escalates beyond starting mode
Final fallback: line-based truncation of most aggressive mode's output

Test plan

16 new integration tests in cli_token_budget.rs covering:
- Basic: flag accepted, zero rejected, output within budget invariant
- Cascade: large budget no cascade, small budget cascades, very small budget line truncation
- Interaction: explicit mode, max-lines, show-stats, stdin, glob
- Edge cases: empty file, already within budget, Python/Rust files
All existing 151+ core tests continue to pass (345 total tests)
cargo clippy -- -D warnings clean
cargo fmt -- --check clean
Snyk security scan: zero issues
No tiktoken dependency added to rskim-core
Token budget invariant verified: output tokens <= budget in all test cases

Add a --tokens N flag that progressively cascades through transformation modes (full -> minimal -> structure -> signatures -> types) until the output fits within the specified token budget. This is the core "smart context fitting" feature for AI agents. Architecture: - Core library (rskim-core) stays pure: no tiktoken dependency - Mode::aggressiveness() and Mode::cascade_from_here() for ordering - truncate_to_token_budget() accepts Fn(&str)->usize closure (DI) - Cascade orchestration lives in CLI crate (rskim) - Token counting (tiktoken) stays in CLI crate only - Per-file budget for multi-file/glob operations - Diagnostic messages to stderr when cascade escalates - Token budget added to cache key (follows max_lines pattern) When --mode M and --tokens N are both specified, cascade starts at M. Final fallback: line-based truncation of the most aggressive mode output. Co-Authored-By: Claude <noreply@anthropic.com>

- Extract make_marker/fmt_opt closures to DRY repeated format strings - Unify cascade_for_token_budget and cascade_for_token_budget_stdin into a single generic function parameterized by a transform closure - Rename include_original to compute_token_stats for clarity - Remove unused ProcessResult::original field - Replace constructor fns with struct literals

dean0x · 2026-03-14T10:29:18Z

crates/rskim-core/src/lib.rs

+
 mod parser;
-mod transform;
+pub mod transform;


Module visibility should remain private

The transform module was changed from mod transform (private) to pub mod transform. This exposes the entire module hierarchy as public API, including internal types and implementation details that were not designed for external consumption.

The stated goal is to re-export only truncate_to_token_budget, which is already done via pub use transform::truncate::truncate_to_token_budget on line 41. Making the entire module pub is unnecessary and creates an Interface Segregation Principle violation.

Fix: Revert to private module visibility:

pub use transform::truncate::truncate_to_token_budget; mod transform; // keep private

The pub use re-export works even with private modules. Revert transform/mod.rs:11 to pub(crate) mod truncate as well.

dean0x · 2026-03-14T10:29:27Z

crates/rskim-core/src/transform/truncate.rs

+            best = mid;
+            lo = mid + 1;
+        } else {
+            hi = mid - 1;


Binary search edge case with usize arithmetic

When mid = 0 and the candidate exceeds budget, the code reaches hi = mid - 1, where hi is usize(0). This creates an underflow risk. While the current code is safe due to the mid == 0 early break on line 377, this depends on a guard that is not immediately obvious to maintainers. A future refactor that removes the guard could introduce a panic.

Fix: Use a half-open interval pattern to avoid usize subtraction:

let mut lo: usize = 0; let mut hi: usize = lines.len() + 1; // exclusive upper bound while lo < hi { let mid = lo + (hi - lo) / 2; // ... build candidate ... if count_tokens(&candidate) <= token_budget { best = mid; lo = mid + 1; } else { hi = mid; // no subtraction needed } }

This eliminates the subtraction and makes the loop bounds logic clearer.

dean0x · 2026-03-14T10:29:36Z

crates/rskim-core/src/transform/truncate.rs

+                );
+            }
+        }
+    }


Test assertions are too weak for the budget invariant

The core invariant of truncate_to_token_budget is: output tokens ≤ budget. However, the tests skip this assertion for small budgets (1-4 tokens) with a comment saying "the marker itself might exceed."

This is a known limitation, but the test should verify it explicitly by:

Computing the exact marker token count

Only testing budgets large enough to accommodate the marker

Enforcing the invariant strictly rather than using a 10x tolerance

Fix: Strengthen the test:

#[test] fn test_token_budget_output_invariant() { let text = "word1 word2 word3\nword4 word5 word6\nword7 word8 word9\n"; // Compute the minimum viable budget (marker token count) let min_viable_budget = word_count("// ... (3 lines truncated)"); // Test budgets starting from minimum viable for budget in min_viable_budget..20 { let result = truncate_to_token_budget(text, Language::TypeScript, budget, word_count).unwrap(); let token_count = word_count(&result); assert!( token_count <= budget, "Budget {}: output has {} tokens, expected ≤ {}", budget, token_count, budget ); } }

dean0x · 2026-03-14T10:29:44Z

crates/rskim/src/main.rs

+            continue;
+        };
+
+        let token_count = tokens::count_tokens(&output).unwrap_or(usize::MAX);


Silent failure on token counting errors

When tokens::count_tokens(&output) fails, the code silently treats it as "infinite tokens" with .unwrap_or(usize::MAX). This means if tiktoken initialization fails (corrupted data, memory issue), every mode in the cascade will appear to exceed the budget, leading to aggressive line truncation without any warning to the user.

The stderr diagnostic "all modes exceeded budget, applying line truncation" would be misleading when the real problem is a tokenizer failure.

Fix: Add a diagnostic warning on first failure:

let token_count = match tokens::count_tokens(&output) { Ok(count) => count, Err(e) => { eprintln!("[skim] warning: token counting failed: {}", e); usize::MAX } };

This applies to both the cascade loop and the fallback closure.

dean0x · 2026-03-14T10:30:08Z

Summary: Additional Review Findings (60-79% Confidence)

These items were flagged by reviewers but fall below the 80% confidence threshold for inline comments. They represent architectural considerations and performance optimizations worth discussing.

Performance (Multiple reviewers - 75-80% confidence)

Redundant token counting in cascade + fallback boundary

In the cascade loop, token counts are computed for each mode's output, but when the fallback truncate_to_token_budget is invoked, it re-counts the same text on the first fast-path check.
Fix: Pass the already-computed token count into the fallback to skip the redundant first count call.

String allocation in binary search

The binary search builds candidate strings via lines[..mid].join("\n") on every iteration. For large files this creates many intermediate allocations.
Optimization: Precompute byte offsets and use slice references instead of joins (would eliminate O(log N) allocations). This is a micro-optimization not blocking.

Cascade re-parses source with tree-sitter for each mode

Each mode in the cascade calls transform_fn, which re-parses the source with tree-sitter. For a full cascade (up to 5 modes), this means 5 separate parse operations.
Optimization: Parse tree-sitter AST once and reuse it across mode transformations. Requires exposing a "parse-once, transform-many" API from rskim-core.

Architecture (70-75% confidence)

Cache key stores starting mode, not final cascade mode

When cascade escalates (e.g., user requests structure but cascade produces signatures), the cache stores starting_mode not final_mode. The metadata is semantically inconsistent though functionally correct.
Consideration: Document this behavior or store actual mode used.

_mode_used return value discarded at call sites

The cascade function returns mode_used but both call sites discard it with _mode_used. This information should be stored in ProcessResult for cache and stats reporting.
Suggestion: Add mode_used: Mode to ProcessResult to propagate through the system.

Complexity (65% confidence - non-blocking refactoring)

process_file (120 lines) and main (178 lines) exceed complexity thresholds

Both functions interleave multiple responsibilities (cache lookup, file I/O, transformation dispatch, token counting, cache writing).
Refactoring suggestion: Extract try_cached_result() and run_transform() helpers to bring functions under 50-line threshold. Can be done in follow-up PR.

Testing (60-65% confidence)

Fragile stderr parsing in test_tokens_budget_invariant_with_fixture

The test parses stderr to extract token counts. If stats format changes, the assertion silently skips.
Fix: Make the assertion unconditional, fail explicitly if parsing fails.

Missing direct unit tests for cascade_for_token_budget function

Tested only indirectly through CLI integration tests. Consider adding unit tests for cascade ordering, diagnostics, and fallback behavior.

Recommendation: The 4 inline comments address the highest-severity findings (>80% confidence). These summary items are architectural considerations and optimizations worth addressing in follow-up PRs, not blocking approval.

Batch 1 review fixes: - B1: Restrict transform module visibility to pub(crate) in lib.rs and transform/mod.rs. The public API re-export on lib.rs:41 already provides external access to truncate_to_token_budget. - B2: Use saturating_sub for binary search hi-pointer in truncate_to_token_budget to prevent potential usize underflow if the mid==0 guard is ever refactored away. - B3: Enforce token budget invariant for very small budgets. When the omission marker alone exceeds the budget, return empty string instead of violating the invariant. Updated doc comment with minimum effective budget caveat. Strengthened invariant test to assert for ALL budgets. - S5: Return &'static [Mode] from Mode::cascade_from_here() instead of allocating a Vec<Mode> on every call. Updated caller in main.rs. - S11 (false positive): #[must_use] on truncate_to_token_budget is redundant because Result already carries #[must_use]. Clippy rejects the double attribute. No change made. Co-Authored-By: Claude <noreply@anthropic.com>

Replace silent-pass patterns in test_tokens_budget_invariant_with_fixture: - Use unwrap_or_else(panic!) instead of if-let for stderr parsing - Add explicit guard for empty token strings - Parse failures now fail the test instead of silently passing

…st helper - Binary search: start lo=1 (best=0 is the default), remove mid==0 special case and dead post-loop check, plain subtraction is now safe - Stdin path: reuse report_token_stats helper instead of duplicating pattern - Tests: extract parse_transformed_token_count helper, use Unicode-aware byte offset for arrow character

…mode S7: 5 unit tests for cascade_for_token_budget (first mode fit, escalation, line truncation fallback, single-mode cascade, all-None error) S8: CLI test for --tokens with --mode=types (single-mode cascade) S9: Unit test for marker-only output (zero content lines, best=0) S10: Cache round-trip test with token_budget parameter

- Simplify truncate_to_token_budget: remove redundant best==0 branch since lines[..0] is empty and produces the same result via the general path - Fix doc comment on process_file to accurately describe return type

…hurn (#29) ## Summary Fixes the two performance issues deferred from #27 to tech debt (#28): - **B4**: `truncate_to_token_budget` now accepts `known_token_count: Option<usize>`, eliminating the redundant fast-path tokenization when the cascade loop already computed the count - **B5**: Binary search replaces per-iteration `lines[..mid].join("\n")` with a single pre-joined string and byte-offset index, reducing allocation work from O(N log N) to O(N) Also adds 3 new unit tests for the `known_token_count` parameter and a dedicated criterion benchmark group (`token_budget_truncation`) at 100–5000 line scales. Closes B4 and B5 from #28. Remaining items (B6, B8, S2, S5, S6, S13) are unchanged. ## Test plan - [x] All 154 tests pass (`cargo test` — 151 existing + 3 new) - [x] `cargo clippy -- -D warnings` clean - [x] `test_token_budget_known_count_none_behaves_like_before` — property test: `None` and `Some(actual)` produce identical output for budgets 1..20 - [x] Issue #28 updated with B4/B5 struck through --------- Co-authored-by: Dean Sharon <deanshrn@gmain.com>

Dean Sharon and others added 2 commits March 14, 2026 02:29

dean0x commented Mar 14, 2026

View reviewed changes

Dean Sharon and others added 6 commits March 14, 2026 12:44

fix: address self-review issues

8fec71a

- Simplify truncate_to_token_budget: remove redundant best==0 branch since lines[..0] is empty and produces the same result via the general path - Fix doc comment on process_file to accurately describe return type

style: apply rustfmt formatting

7cb71a0

dean0x merged commit 9d6ad6b into main Mar 14, 2026
9 checks passed

dean0x deleted the feature/tokens-budget-mode branch March 14, 2026 18:46

dean0x mentioned this pull request Mar 14, 2026

perf: eliminate redundant tokenization and binary search allocation churn #29

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --tokens N token budget cascade mode#27

feat: add --tokens N token budget cascade mode#27
dean0x merged 8 commits intomainfrom
feature/tokens-budget-mode

dean0x commented Mar 14, 2026

Uh oh!

dean0x Mar 14, 2026

Uh oh!

dean0x Mar 14, 2026

Uh oh!

dean0x Mar 14, 2026

Uh oh!

dean0x Mar 14, 2026

Uh oh!

dean0x commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dean0x commented Mar 14, 2026

Summary

Changes

Core Library (rskim-core)

CLI (rskim)

Interactions

Test plan

Uh oh!

dean0x Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x commented Mar 14, 2026

Summary: Additional Review Findings (60-79% Confidence)

Performance (Multiple reviewers - 75-80% confidence)

Architecture (70-75% confidence)

Complexity (65% confidence - non-blocking refactoring)

Testing (60-65% confidence)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant