-
Notifications
You must be signed in to change notification settings - Fork 73
Open
Description
I am not a Rust developer. I consulted an AI and it provided some solutions.
Problem Analysis
Error Message
Stats failed: From UTF-8 error: invalid utf-8 sequence of 1 bytes from index 81909
Root Cause
When executing the git-ai stats command, the code attempts to convert Git diff output to a UTF-8 string at the following location:
Location: src/git/repository.rs line 1773
let diff_output = String::from_utf8(output.stdout)?;This error is triggered when the repository contains:
- Binary files (images, compiled files, archives, etc.)
- Non-UTF-8 encoded text files (such as GBK, Latin-1, etc.)
- Files containing invalid UTF-8 sequences
In your case, commit e022db36e2d16b63d8477439451b05599c3da117 likely contains:
- iOS project binary resource files (.png, .jpg, .xcassets, etc.)
- Build artifacts or dependency libraries
- Files with special encodings
Why This Affects the Stats Command
The git-ai stats command execution flow:
- Get commit diff statistics
- Call
diff_added_lines()to parse added line numbers - Fails here: Attempts to convert Git diff output to UTF-8 string
- Run
git-ai blameon each added line to determine AI attribution
Solutions
Solution 1: Use UTF-8 Lossy Conversion (Recommended)
Change strict UTF-8 conversion to lossy conversion, which automatically replaces invalid characters:
// Before (src/git/repository.rs:1773)
let diff_output = String::from_utf8(output.stdout)?;
// After
let diff_output = String::from_utf8_lossy(&output.stdout).to_string();Pros:
- Won't fail due to non-UTF-8 content
- For diff parsing, replacing invalid characters doesn't affect results (only parsing line numbers and filenames)
- Simple and straightforward
Cons:
- May lose some special character information (but minimal impact on stats command)
Solution 2: Add Binary File Filtering
Add --no-binary option to Git diff command to skip binary files:
// Modify src/git/repository.rs:1745
args.push("--no-binary".to_string()); // Add this line
args.push("-U0".to_string());Pros:
- Avoids processing binary files at the source
- Maintains UTF-8 strictness
Cons:
- Still can't handle non-UTF-8 encoded text files
- May miss some files that need to be counted
Solution 3: Combined Approach (Best)
Combine Solutions 1 and 2:
// src/git/repository.rs
pub fn diff_added_lines(
&self,
from_ref: &str,
to_ref: &str,
pathspecs: Option<&HashSet<String>>,
) -> Result<HashMap<String, Vec<u32>>, GitAiError> {
let mut args = self.global_args_for_exec();
args.push("diff".to_string());
args.push("-U0".to_string());
args.push("--no-color".to_string());
args.push("--no-binary".to_string()); // Add: skip binary files
args.push(from_ref.to_string());
args.push(to_ref.to_string());
// ... pathspecs handling ...
let output = exec_git(&args)?;
// Use lossy conversion instead of strict conversion
let diff_output = String::from_utf8_lossy(&output.stdout).to_string();
let mut result = parse_diff_added_lines(&diff_output)?;
// ... subsequent processing ...
}Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels