feat: add --max-lines N AST-aware smart truncation (#18)#23
Conversation
Add priority-based truncation that respects AST node boundaries when limiting output to N lines. Types and interfaces are kept preferentially over functions, which are kept over imports, which are kept over bodies. Truncation engine: - NodeSpan maps transformed output line ranges to tree-sitter node kinds - Greedy span selection with marker budget reservation - Language-appropriate omission markers (// / # / <!-- -->) - Simple line truncation fallback for JSON/YAML Pipeline integration: - All transform modes (structure, signatures, types, minimal, full) produce NodeSpan metadata via _with_spans variants - TransformConfig gains max_lines: Option<usize> with builder method - Cache key includes max_lines for isolation - New public API: transform_auto_with_config() CLI: - --max-lines <N> flag with validation (>= 1) - Composable with all modes, glob, parallel, stdin, --show-stats - MultiFileOptions struct replaces loose parameters Tests: 22 new tests (14 integration + 8 CLI) covering priority ordering, omission markers, mode composition, per-language behavior, and edge cases. All 274 tests pass (184 core + 90 CLI).
YAML comments use # not //. The previous implementation used // which is not valid YAML syntax. JSON retains // as it has no comment syntax and // is the JSONC convention.
- Consolidate serde-based language handling (JSON/YAML) - Remove duplicate truncation logic in match branches - Apply truncation uniformly after transformation - Improve code readability with clearer flow No behavior change.
Code Review Summary - Comprehensive PR AnalysisThis PR introduces AST-aware truncation with solid overall design, but requires fixes for 7 blocking issues and several should-fix items before approval. Review reports synthesized from 8 independent analyses (architecture, complexity, consistency, performance, regression, Rust, security, tests). CRITICAL BLOCKING ISSUES (Must Fix)1. Markdown Header Extraction Duplication
|
Detailed Blocking Issue FixesIssue 1: Markdown Header DuplicationCurrent state (structure.rs):
Suggested fix - collapse to one function: pub(crate) fn extract_markdown_headers_with_spans(
source: &str,
tree: &Tree,
min_level: u32,
max_level: u32,
) -> Result<(String, Vec<(String, &'static str)>)> {
// ... existing implementation (lines 502-598) ...
}
pub(crate) fn extract_markdown_headers(
source: &str,
tree: &Tree,
min_level: u32,
max_level: u32,
) -> Result<String> {
let (text, _spans) = extract_markdown_headers_with_spans(source, tree, min_level, max_level)?;
Ok(text)
}This follows the pattern already used in Issue 2: Unify Node Kind TablesCurrent problem:
Suggested fix in utils.rs: struct NodeKindInfo {
static_kind: &'static str,
priority: u8,
}
fn node_kind_info(kind: &str) -> NodeKindInfo {
match kind {
"type_alias_declaration" => NodeKindInfo { static_kind: "type_alias_declaration", priority: 5 },
"interface_declaration" => NodeKindInfo { static_kind: "interface_declaration", priority: 5 },
"struct_item" => NodeKindInfo { static_kind: "struct_item", priority: 5 },
"class_declaration" => NodeKindInfo { static_kind: "class_declaration", priority: 5 },
"enum_declaration" => NodeKindInfo { static_kind: "enum_declaration", priority: 5 },
"function_declaration" => NodeKindInfo { static_kind: "function_declaration", priority: 4 },
"method_declaration" => NodeKindInfo { static_kind: "method_declaration", priority: 4 },
// ... complete coverage with one source of truth
_ => NodeKindInfo { static_kind: "unknown", priority: 1 },
}
}
pub(crate) fn score_node_kind(kind: &str) -> u8 {
node_kind_info(kind).priority
}
pub(crate) fn to_static_node_kind(kind: &str) -> &'static str {
node_kind_info(kind).static_kind
}Then remove the old Issue 3: Double lines().collect()Current state (truncate.rs lines 75, 211): pub(crate) fn truncate_to_lines(text: &str, spans: &[NodeSpan], language: Language, max_lines: usize) -> Result<String> {
let lines = text.lines().collect::<Vec<&str>>(); // Line 75 - FIRST COLLECT
// ... some early returns ...
// Line 84 or 94
simple_line_truncate(&text, language, max_lines) // This function collects again!
}
// Line 211 in simple_line_truncate
let lines = text.lines().collect::<Vec<&str>>(); // SECOND COLLECT - DUPLICATESuggested fix: Create a variant that accepts pre-collected lines: pub(crate) fn truncate_to_lines_from_lines(
lines: Vec<&str>,
text: &str,
spans: &[NodeSpan],
language: Language,
max_lines: usize,
) -> Result<String> {
if lines.len() <= max_lines {
return Ok(text.to_string());
}
// ... rest of logic using &lines instead of re-collecting
}
fn simple_line_truncate_from_lines(
lines: &[&str],
language: Language,
max_lines: usize,
trailing_newline: bool,
) -> Result<String> {
// ... uses &lines directly, no re-collection
}Issue 4: String Clones in Span BuildersCurrent state (signatures.rs line 93): let texts: Vec<String> = signatures
.iter() // Borrows each (String, &'static str)
.map(|(sig, kind)| {
let line_count = sig.lines().count().max(1);
spans.push(NodeSpan::new(current_line..current_line + line_count, kind));
current_line += line_count;
sig.clone() // ← UNNECESSARY CLONE
})
.collect();Suggested fix: let texts: Vec<String> = signatures
.into_iter() // Consume the Vec, take ownership
.map(|(sig, kind)| {
let line_count = sig.lines().count().max(1);
spans.push(NodeSpan::new(current_line..current_line + line_count, kind));
current_line += line_count;
sig // ← MOVED, NOT CLONED
})
.collect();Apply same fix to types.rs:183 and structure.rs:593. Issue 5: _with_spans Pattern ConsistencyCurrent inconsistency (mod.rs lines 82-88): Mode::Minimal => {
let text = minimal::transform_minimal(source, tree, language, config)?;
let line_count = text.lines().count();
let spans = vec![NodeSpan::new(0..line_count, "source_file")]; // ← INLINE SPAN GENERATION
Ok((text, spans))
}Option A - Extract dedicated function (recommended): Create // In minimal.rs
pub(crate) fn transform_minimal_with_spans(
source: &str,
tree: &Tree,
language: Language,
config: &TransformConfig,
) -> Result<(String, Vec<NodeSpan>)> {
let text = transform_minimal(source, tree, language, config)?;
let line_count = text.lines().count();
let spans = vec![NodeSpan::new(0..line_count, "source_file")];
Ok((text, spans))
}
// Then in mod.rs
Mode::Minimal => minimal::transform_minimal_with_spans(source, tree, language, config)Option B - Add architecture comment (if intentional): Mode::Minimal => {
// ARCHITECTURE: Minimal mode preserves all code, so span
// decomposition is not meaningful. Returns single source_file span.
let text = minimal::transform_minimal(source, tree, language, config)?;
let line_count = text.lines().count();
let spans = vec![NodeSpan::new(0..line_count, "source_file")];
Ok((text, spans))
}Issue 6 & 7: CLI Bounds and Span BoundariesIssue 6 - Add to main.rs after line 609: const MAX_MAX_LINES: usize = 1_000_000;
// After validation
if let Some(max_lines) = args.max_lines {
if max_lines == 0 {
anyhow::bail!("--max-lines must be at least 1\nUse --max-lines 1 to get a single line of output.");
}
if max_lines > MAX_MAX_LINES {
anyhow::bail!(
"--max-lines value too high: {} (maximum: {})",
max_lines,
MAX_MAX_LINES
);
}
}Issue 7 - Fix in structure.rs lines 316-317: // Current:
let output_end = source_to_output_byte(source_end).min(output.len());
// Fixed:
let output_start = source_to_output_byte(source_start).min(output.len());
let output_end = source_to_output_byte(source_end).min(output.len());Estimated time to fix all 7 issues: ~41 minutes Once complete, run: cargo test --all
cargo clippy -- -D warningsThen force-push with message: Generated by Claude Code - Multi-review synthesis |
…ry clones - extract_markdown_headers now delegates to extract_markdown_headers_with_spans and discards spans, matching the existing transform_structure pattern - Replace .iter() + .clone() with .into_iter() in span-building iterators across signatures.rs, types.rs, and structure.rs to avoid unnecessary String allocations (zero-copy design constraint) Co-Authored-By: Claude <noreply@anthropic.com>
…dge-case tests - Move empty-spans check before lines().collect() in truncate_to_lines to avoid redundant allocation when falling back to simple_line_truncate - Add boundary test documenting max_lines=0 behavior at core library level (CLI validates >= 1, but core silently clamps) - Add tests for overlapping and adjacent NodeSpan ranges confirming output never exceeds the max_lines budget Co-Authored-By: Claude <noreply@anthropic.com>
…en span logic - Add MAX_MAX_LINES (1,000,000) validation matching --jobs pattern (B8) - Restore // SECURITY: comments on depth guards in signatures/types (SF1) - Add ARCHITECTURE comment explaining Full/Minimal inline spans (SF2) - Gate dead wrapper functions with #[cfg(test)] instead of #[allow(dead_code)] (SF3) - Add .min(output.len()) clamp to output_start for consistency (SF4) - Add // SAFETY: comments documenting usize-to-i64 cast bounds (SF5) Co-Authored-By: Claude <noreply@anthropic.com>
- Promote Python class_definition to priority 5 (type system) so classes are kept over standalone functions under tight --max-lines budgets - Compute tokens inline on cache hit when cached without --show-stats, fixing silent stats gap on subsequent --show-stats runs - Fix clear_cache ENOTEMPTY race by removing files individually instead of remove_dir_all + create_dir_all during parallel test execution
Summary
--max-lines Nflag for AST-aware output truncation with priority-based node selection//,#,<!-- -->) inserted between gapsskim file.rs --mode=structure --max-lines 50Implementation
truncate.rsmodule withNodeSpan-based truncation engine_with_spansvariants for structure/signatures/types transformssimple_line_truncate()fallback for serde languages (JSON/YAML)transform_auto_with_config()public APITest plan
Closes #18 (Step 2)