feat: add --tokens N token budget cascade mode#27
Conversation
Add a --tokens N flag that progressively cascades through transformation modes (full -> minimal -> structure -> signatures -> types) until the output fits within the specified token budget. This is the core "smart context fitting" feature for AI agents. Architecture: - Core library (rskim-core) stays pure: no tiktoken dependency - Mode::aggressiveness() and Mode::cascade_from_here() for ordering - truncate_to_token_budget() accepts Fn(&str)->usize closure (DI) - Cascade orchestration lives in CLI crate (rskim) - Token counting (tiktoken) stays in CLI crate only - Per-file budget for multi-file/glob operations - Diagnostic messages to stderr when cascade escalates - Token budget added to cache key (follows max_lines pattern) When --mode M and --tokens N are both specified, cascade starts at M. Final fallback: line-based truncation of the most aggressive mode output. Co-Authored-By: Claude <noreply@anthropic.com>
- Extract make_marker/fmt_opt closures to DRY repeated format strings - Unify cascade_for_token_budget and cascade_for_token_budget_stdin into a single generic function parameterized by a transform closure - Rename include_original to compute_token_stats for clarity - Remove unused ProcessResult::original field - Replace constructor fns with struct literals
crates/rskim-core/src/lib.rs
Outdated
|
|
||
| mod parser; | ||
| mod transform; | ||
| pub mod transform; |
There was a problem hiding this comment.
Module visibility should remain private
The transform module was changed from mod transform (private) to pub mod transform. This exposes the entire module hierarchy as public API, including internal types and implementation details that were not designed for external consumption.
The stated goal is to re-export only truncate_to_token_budget, which is already done via pub use transform::truncate::truncate_to_token_budget on line 41. Making the entire module pub is unnecessary and creates an Interface Segregation Principle violation.
Fix: Revert to private module visibility:
pub use transform::truncate::truncate_to_token_budget;
mod transform; // keep privateThe pub use re-export works even with private modules. Revert transform/mod.rs:11 to pub(crate) mod truncate as well.
| best = mid; | ||
| lo = mid + 1; | ||
| } else { | ||
| hi = mid - 1; |
There was a problem hiding this comment.
Binary search edge case with usize arithmetic
When mid = 0 and the candidate exceeds budget, the code reaches hi = mid - 1, where hi is usize(0). This creates an underflow risk. While the current code is safe due to the mid == 0 early break on line 377, this depends on a guard that is not immediately obvious to maintainers. A future refactor that removes the guard could introduce a panic.
Fix: Use a half-open interval pattern to avoid usize subtraction:
let mut lo: usize = 0;
let mut hi: usize = lines.len() + 1; // exclusive upper bound
while lo < hi {
let mid = lo + (hi - lo) / 2;
// ... build candidate ...
if count_tokens(&candidate) <= token_budget {
best = mid;
lo = mid + 1;
} else {
hi = mid; // no subtraction needed
}
}This eliminates the subtraction and makes the loop bounds logic clearer.
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Test assertions are too weak for the budget invariant
The core invariant of truncate_to_token_budget is: output tokens ≤ budget. However, the tests skip this assertion for small budgets (1-4 tokens) with a comment saying "the marker itself might exceed."
This is a known limitation, but the test should verify it explicitly by:
- Computing the exact marker token count
- Only testing budgets large enough to accommodate the marker
- Enforcing the invariant strictly rather than using a 10x tolerance
Fix: Strengthen the test:
#[test]
fn test_token_budget_output_invariant() {
let text = "word1 word2 word3\nword4 word5 word6\nword7 word8 word9\n";
// Compute the minimum viable budget (marker token count)
let min_viable_budget = word_count("// ... (3 lines truncated)");
// Test budgets starting from minimum viable
for budget in min_viable_budget..20 {
let result = truncate_to_token_budget(text, Language::TypeScript, budget, word_count).unwrap();
let token_count = word_count(&result);
assert!(
token_count <= budget,
"Budget {}: output has {} tokens, expected ≤ {}",
budget, token_count, budget
);
}
}
crates/rskim/src/main.rs
Outdated
| continue; | ||
| }; | ||
|
|
||
| let token_count = tokens::count_tokens(&output).unwrap_or(usize::MAX); |
There was a problem hiding this comment.
Silent failure on token counting errors
When tokens::count_tokens(&output) fails, the code silently treats it as "infinite tokens" with .unwrap_or(usize::MAX). This means if tiktoken initialization fails (corrupted data, memory issue), every mode in the cascade will appear to exceed the budget, leading to aggressive line truncation without any warning to the user.
The stderr diagnostic "all modes exceeded budget, applying line truncation" would be misleading when the real problem is a tokenizer failure.
Fix: Add a diagnostic warning on first failure:
let token_count = match tokens::count_tokens(&output) {
Ok(count) => count,
Err(e) => {
eprintln!("[skim] warning: token counting failed: {}", e);
usize::MAX
}
};This applies to both the cascade loop and the fallback closure.
Summary: Additional Review Findings (60-79% Confidence)These items were flagged by reviewers but fall below the 80% confidence threshold for inline comments. They represent architectural considerations and performance optimizations worth discussing. Performance (Multiple reviewers - 75-80% confidence)Redundant token counting in cascade + fallback boundary
String allocation in binary search
Cascade re-parses source with tree-sitter for each mode
Architecture (70-75% confidence)Cache key stores starting mode, not final cascade mode
Complexity (65% confidence - non-blocking refactoring)
Testing (60-65% confidence)Fragile stderr parsing in
Missing direct unit tests for
Recommendation: The 4 inline comments address the highest-severity findings (>80% confidence). These summary items are architectural considerations and optimizations worth addressing in follow-up PRs, not blocking approval. |
Batch 1 review fixes: - B1: Restrict transform module visibility to pub(crate) in lib.rs and transform/mod.rs. The public API re-export on lib.rs:41 already provides external access to truncate_to_token_budget. - B2: Use saturating_sub for binary search hi-pointer in truncate_to_token_budget to prevent potential usize underflow if the mid==0 guard is ever refactored away. - B3: Enforce token budget invariant for very small budgets. When the omission marker alone exceeds the budget, return empty string instead of violating the invariant. Updated doc comment with minimum effective budget caveat. Strengthened invariant test to assert for ALL budgets. - S5: Return &'static [Mode] from Mode::cascade_from_here() instead of allocating a Vec<Mode> on every call. Updated caller in main.rs. - S11 (false positive): #[must_use] on truncate_to_token_budget is redundant because Result already carries #[must_use]. Clippy rejects the double attribute. No change made. Co-Authored-By: Claude <noreply@anthropic.com>
Replace silent-pass patterns in test_tokens_budget_invariant_with_fixture: - Use unwrap_or_else(panic!) instead of if-let for stderr parsing - Add explicit guard for empty token strings - Parse failures now fail the test instead of silently passing
…st helper - Binary search: start lo=1 (best=0 is the default), remove mid==0 special case and dead post-loop check, plain subtraction is now safe - Stdin path: reuse report_token_stats helper instead of duplicating pattern - Tests: extract parse_transformed_token_count helper, use Unicode-aware byte offset for arrow character
…mode
S7: 5 unit tests for cascade_for_token_budget (first mode fit, escalation,
line truncation fallback, single-mode cascade, all-None error)
S8: CLI test for --tokens with --mode=types (single-mode cascade)
S9: Unit test for marker-only output (zero content lines, best=0)
S10: Cache round-trip test with token_budget parameter
- Simplify truncate_to_token_budget: remove redundant best==0 branch since lines[..0] is empty and produces the same result via the general path - Fix doc comment on process_file to accurately describe return type
…hurn (#29) ## Summary Fixes the two performance issues deferred from #27 to tech debt (#28): - **B4**: `truncate_to_token_budget` now accepts `known_token_count: Option<usize>`, eliminating the redundant fast-path tokenization when the cascade loop already computed the count - **B5**: Binary search replaces per-iteration `lines[..mid].join("\n")` with a single pre-joined string and byte-offset index, reducing allocation work from O(N log N) to O(N) Also adds 3 new unit tests for the `known_token_count` parameter and a dedicated criterion benchmark group (`token_budget_truncation`) at 100–5000 line scales. Closes B4 and B5 from #28. Remaining items (B6, B8, S2, S5, S6, S13) are unchanged. ## Test plan - [x] All 154 tests pass (`cargo test` — 151 existing + 3 new) - [x] `cargo clippy -- -D warnings` clean - [x] `test_token_budget_known_count_none_behaves_like_before` — property test: `None` and `Some(actual)` produce identical output for budgets 1..20 - [x] Issue #28 updated with B4/B5 struck through --------- Co-authored-by: Dean Sharon <deanshrn@gmain.com>
Summary
--tokens Nflag that progressively cascades through transformation modes (full -> minimal -> structure -> signatures -> types) until output fits within N tokensChanges
Core Library (rskim-core)
Mode::aggressiveness()ordering (Full=0..Types=4) andMode::cascade_from_here()that returns cascade sequence from any starting modetruncate_to_token_budget()with binary-search line truncation acceptingFn(&str) -> usizeclosure for token counting (dependency injection)truncate_to_token_budgetCLI (rskim)
--tokens Nargument,cascade_for_token_budget()function, wired through all processing paths (single file, stdin, glob, directory)token_budgetto cache key hash (follows exactmax_linespattern)Interactions
--mode M --tokens N: Cascade starts at mode M, never less aggressive--tokens N --max-lines M: Both constraints applied independently--tokens N --show-stats: Reports correct token counts for final outputTest plan
cli_token_budget.rscovering:cargo clippy -- -D warningscleancargo fmt -- --checkclean