-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Background
PR #41 introduced unified tool operations tracking with file change metrics. Currently, the system calculates line changes using a simple approach: counting the total number of newlines in old_string and new_string from Edit operations.
Current Implementation (src/tools/parsers/edit.rs:104-132):
pub fn lines_before(&self) -> Option<i32> {
self.old_string.as_ref().map(|s| {
if s.is_empty() { return 0; }
let newline_count = s.chars().filter(|&c| c == '\n').count();
if s.ends_with('\n') { newline_count as i32 }
else { (newline_count + 1) as i32 }
})
}Current Calculation Logic (src/models/tool_operation.rs:145-169):
pub fn with_line_metrics(mut self, lines_before: Option<i32>, lines_after: Option<i32>) -> Self {
if let Some(meta) = &mut self.file_metadata {
meta.lines_before = lines_before;
meta.lines_after = lines_after;
// Simple subtraction - not actual diff!
if let (Some(before), Some(after)) = (lines_before, lines_after) {
if after > before {
meta.lines_added = Some(after - before);
meta.lines_removed = Some(0);
} else if before > after {
meta.lines_added = Some(0);
meta.lines_removed = Some(before - after);
}
}
}
self
}Problem
The current approach has several accuracy issues:
- Inaccurate Change Detection: When
old_stringhas 10 lines andnew_stringhas 15 lines, the system reports "5 lines added, 0 removed" - but this ignores that some of those 10 original lines might have been modified or deleted - No Context Awareness: Cannot distinguish between:
- Adding 5 new lines to existing 10 lines (actual: +5, -0)
- Replacing 10 lines with completely different 15 lines (actual: +15, -10)
- Adding 5 lines while modifying some of the original 10 lines
- Misleading Metrics: The
total_line_changes()andnet_line_change()methods rely on these inaccurate numbers, leading to incorrect analysis of code modification scope
Proposed Solution
Implement a proper diff algorithm to calculate actual line additions and deletions:
Phase 1: Myers Diff Algorithm Integration
-
Add dependency for diff calculation:
[dependencies] similar = "2.3" # or another diff library like `diff` or `dissimilar`
-
Enhance EditData parser (src/tools/parsers/edit.rs):
use similar::{ChangeTag, TextDiff}; impl EditData { /// Calculate actual line changes using Myers diff algorithm pub fn calculate_diff_metrics(&self) -> Option<DiffMetrics> { if let (Some(old), Some(new)) = (&self.old_string, &self.new_string) { let diff = TextDiff::from_lines(old, new); let mut added = 0; let mut removed = 0; let mut unchanged = 0; for change in diff.iter_all_changes() { match change.tag() { ChangeTag::Insert => added += 1, ChangeTag::Delete => removed += 1, ChangeTag::Equal => unchanged += 1, } } Some(DiffMetrics { lines_added: added, lines_removed: removed, lines_unchanged: unchanged }) } else { None } } } pub struct DiffMetrics { pub lines_added: i32, pub lines_removed: i32, pub lines_unchanged: i32, }
-
Update ToolOperation builder (src/models/tool_operation.rs):
pub fn with_accurate_diff_metrics(mut self, metrics: DiffMetrics) -> Self { if let Some(meta) = &mut self.file_metadata { meta.lines_added = Some(metrics.lines_added); meta.lines_removed = Some(metrics.lines_removed); // Keep before/after for total line counts meta.lines_before = Some(metrics.lines_removed + metrics.lines_unchanged); meta.lines_after = Some(metrics.lines_added + metrics.lines_unchanged); } self }
-
Update ImportService (src/services/import_service.rs:514-527):
"Edit" => { let parser = EditParser; if let Ok(parsed) = parser.parse(tool_use) { if let ToolData::Edit(data) = parsed.data { operation = operation .with_file_path(data.file_path.clone()) .with_file_type(data.is_code_file(), data.is_config_file()) .with_edit_flags(data.is_bulk_replacement(), data.is_refactoring()); // Use accurate diff metrics instead of simple line counting if let Some(metrics) = data.calculate_diff_metrics() { operation = operation.with_accurate_diff_metrics(metrics); } } } }
Phase 2: Enhanced Analytics
Once accurate diff metrics are available, add repository queries for:
- Code churn analysis (lines added + removed per file)
- Refactoring detection improvements (high churn but similar structure)
- Modification patterns (mostly additions vs. mostly deletions vs. balanced edits)
Benefits
- Accurate Metrics: Precise tracking of actual code changes
- Better Analysis: Improved refactoring detection, code review insights, and productivity metrics
- Foundation for Future Features:
- Time-series analysis of code evolution
- Identification of frequently modified code sections
- Better integration with retrospection analysis (see issue #TBD)
Testing Considerations
- Add unit tests with various edit scenarios:
- Pure addition (append to file)
- Pure deletion (remove lines)
- Mixed edits (add some, remove some, keep some)
- Complete replacement (no common lines)
- Whitespace-only changes
- Verify backward compatibility with existing data (migration not needed, as we only improve future calculations)
Related
- PR feat: Add unified tool operations tracking with file change metrics #41: feat: Add unified tool operations tracking with file change metrics
- Part of the "Future Improvements" roadmap from PR feat: Add unified tool operations tracking with file change metrics #41
Priority
Medium - Improves data accuracy but system is functional with current approach