Skip to content

Tech debt: token budget cascade follow-ups #28

@dean0x

Description

@dean0x

Performance

  • B4: Redundant tokenization at cascade/fallback boundary — cascade counts tokens, then truncate_to_token_budget re-counts the same string. Pass the already-computed count through. ✅ Resolved in perf: eliminate redundant tokenization and binary search allocation churn #29
  • B5: Binary search in truncate_to_token_budget builds candidate strings via join on every iteration. Use precomputed byte offsets for slice-based indexing instead. ✅ Resolved in perf: eliminate redundant tokenization and binary search allocation churn #29
  • B8: Cascade re-parses tree-sitter AST for each mode. Parse once and reuse across mode transformations (requires core API changes).
  • S5 (partial): cascade_from_here returns &'static [Mode] now (FIXED), but the cascade loop still creates a config per iteration.
  • B11: Per-iteration tokenization in binary search — count_tokens is called O(log N) times inside the binary search loop, each O(N). A prefix-sum approach could reduce total to O(N), with trade-off of cross-line token boundary accuracy.
  • B12: Fast-path text.to_string() allocation — when text fits within budget, the function copies the entire string. Could return Cow<str> to avoid this allocation (requires API change).
  • B13: Split-then-rejoin round-trip — text.lines().collect() then lines.join("\n") reconstructs the input. Could scan text directly for newline byte positions and use &str slices, eliminating both the Vec<&str> and the joined String.

Complexity

  • B6: process_file spans 120 lines with 5 responsibilities. Extract try_cached_result and run_transform helpers. main() spans 178 lines with cyclomatic complexity ~14. Extract validate_args() and unify dispatch.
  • S6: ProcessOptions (6 fields) and MultiFileOptions (8 fields) have 5 overlapping fields. Compose ProcessOptions inside MultiFileOptions.
  • S13: Three identical validation blocks for --jobs, --max-lines, --tokens should be a generic validate_bounded_arg helper.

Test Coverage ✅ Resolved

All test coverage gaps were addressed before merge in commit 646bd33:

  • S7: No direct unit tests for cascade_for_token_budget → 5 unit tests added in main.rs
  • S8: Missing test for --tokens with --mode=typestest_tokens_with_mode_types_single_mode_cascade
  • S9: No test for binary search zero-content-lines edge casetest_tokens_very_small_budget_fallback_truncation
  • S10: Cache round-trip test missing for token_budgettest_tokens_budget_invariant_with_fixture

Minor

  • S2: Cache metadata stores starting mode, not actual cascade mode. Misleading for debugging.
  • B10: Cache key format change orphans existing entries (acceptable, document in release notes).
  • S14: lib.rs:41 comment says "for CLI crate" but function is publicly exported. Clarify API stability intent in doc comment (relevant for v1.0.0 release).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions