fix: filter out non-existent commits in grep_ai_notes for shallow clones#365
fix: filter out non-existent commits in grep_ai_notes for shallow clones#365satyamtg wants to merge 1 commit intogit-ai-project:mainfrom
Conversation
Fixes orphaned git notes issue where git-ai stats would fail or return 0 AI lines when some commits referenced in git notes don't exist locally. This happens in: - Shallow clones (--depth=N) - Partial clones missing commits from other branches - Repos after force-push/rebase where old commits are gone The fix: 1. Use git cat-file -e to verify each commit exists before sorting 2. Filter out non-existent commits from the list 3. Only pass existing commits to git log --no-walk 4. Handle edge cases (no commits exist, only one exists) Tested with deputydev-spectator repo shallow clone where 2 out of 4 commits with a prompt were missing. After fix: - Before: 0 prompts loaded, ai_accepted: 0 - After: 1 prompt loaded, ai_accepted: 30 ✅
|
Fixes #366 |
|
Hey @satyamtg first of all, thank you so much for contributing to Git AI! Always makes my day to see issues followed by contributions! Can you help me understand a bit clearer how all of this is supposed to work and how exactly it addresses the issue in #366 Here's my mental model: since git-ai stats is based on git log, if you don't have the commit (e.g., a shallow clone), it would never try to fetch the authorship note for a commit that doesn't exist. Also, if you were to request stats for a commit that doesn't exist, the command will error out -- I don't think there's any other way to do this since Git doesn't know the commit exists on remote, it just knows that it doesn't have it. What's your thinking on this? I think we should update our limitations page to more explicitly call this out, but we intentionally have the stats command silently 'fail' (or I should say render data on a best effort basis). This is also how many of git's commands work for shallow clones (for ex, try |
|
Hey @svarlamov , so basically I was just trying out things with git-ai. I made 4 commits on a branch, using a conversation_id (say X) with my AI agent tool. Then I paused, and pushed the 4 commits (say A, B, C and D) to remote. After that, I basically reset the branch to the base branch state, and continuing the same conversation ID X, did 2 commits with AI changes, and a human commit. Then force pushed these 3 commits to the same branch. (Say commits P, Q, R). Once this was done, I just pulled the branch on my other computer (just that branch) + the git notes at ref = ai. I tried running So, in theory, this seems to be a bug to me. I dug deep and found that when you run stats with a range, it tries to get the prompt from AI notes, then the commits linked to that prompt. In this case, A, B, C, D were also linked. What I feel is this is how it runs -
For this, at one point it would run Hope this clears the issue. My fix just makes sure that this git log command doesn't fail. The idea, as you already stated, is to do best effort attribution. But in my case, even that didn't happen. I've already tested this fix with my exact scenario, and it works fine. |
|
Hey @satyamtg can you possibly provide
I think you stumbled on a real + broader set of stats issues where reusing the same prompt AFTER making a commit. |
| eprintln!("[DEBUG] Found {} commits with pattern '{}' in git notes", sha_vec.len(), pattern); | ||
| for sha in &sha_vec { | ||
| eprintln!("[DEBUG] - {}", sha); | ||
| } | ||
|
|
||
| // FILTER OUT NON-EXISTENT COMMITS using git cat-file -e | ||
| let existing_shas: Vec<String> = sha_vec | ||
| .into_iter() | ||
| .filter(|sha| { | ||
| let mut args = repo.global_args_for_exec(); | ||
| args.push("cat-file".to_string()); | ||
| args.push("-e".to_string()); | ||
| args.push(sha.clone()); | ||
|
|
||
| let exists = exec_git(&args).is_ok(); | ||
| if !exists { | ||
| eprintln!("[DEBUG] Filtering out non-existent commit: {}", sha); | ||
| } | ||
| exists | ||
| }) | ||
| .collect(); | ||
|
|
||
| eprintln!("[DEBUG] After filtering: {} existing commits", existing_shas.len()); | ||
| for sha in &existing_shas { | ||
| eprintln!("[DEBUG] - {}", sha); | ||
| } |
There was a problem hiding this comment.
🟡 Debug output unconditionally printed to stderr in production
The new code uses raw eprintln! statements to print debug messages, but the rest of the codebase uses debug_log() which only prints when debug mode is enabled via is_debug_enabled().
Root Cause and Impact
The function grep_ai_notes now contains multiple eprintln! calls at lines 329-331, 345, 351-353, and 394 that will always print debug output to stderr, regardless of whether debug mode is enabled.
For comparison, other functions in the same file use debug_log() (lines 277-280, 292) which is conditional:
// From src/utils.rs
pub fn debug_log(msg: &str) {
if is_debug_enabled() {
eprintln!("\x1b[1;33m[git-ai]\x1b[0m {}", msg);
}
}Impact: In production, users will see debug messages like [DEBUG] Found X commits with pattern 'Y' in git notes printed to stderr every time grep_ai_notes is called with multiple results. This pollutes the terminal output and may confuse users or break scripts that parse stderr.
Prompt for agents
Replace all eprintln! calls in the grep_ai_notes function with debug_log calls to match the existing pattern in the codebase. The debug_log function is already imported at line 5 of src/git/refs.rs.
Specifically, replace:
- Line 329: eprintln!("[DEBUG] Found {} commits...") -> debug_log(&format!("Found {} commits..."))
- Lines 330-332: The for loop with eprintln! -> wrap in debug_log or remove
- Line 345: eprintln!("[DEBUG] Filtering out...") -> debug_log(&format!("Filtering out..."))
- Lines 351-354: eprintln! calls -> debug_log calls
- Line 394: eprintln!("[DEBUG] Single commit...") -> debug_log(&format!("Single commit..."))
Note: The [DEBUG] prefix is not needed when using debug_log since it already adds a [git-ai] prefix.
Was this helpful? React with 👍 or 👎 to provide feedback.
Fixes orphaned git notes issue where git-ai stats would fail or return 0 AI lines when some commits referenced in git notes don't exist locally.
This happens in:
The fix: