Skip to content

fix: filter out non-existent commits in grep_ai_notes for shallow clones#365

Open
satyamtg wants to merge 1 commit intogit-ai-project:mainfrom
satyamtg:fix-stats-for-partial-clone
Open

fix: filter out non-existent commits in grep_ai_notes for shallow clones#365
satyamtg wants to merge 1 commit intogit-ai-project:mainfrom
satyamtg:fix-stats-for-partial-clone

Conversation

@satyamtg
Copy link

@satyamtg satyamtg commented Jan 17, 2026

Fixes orphaned git notes issue where git-ai stats would fail or return 0 AI lines when some commits referenced in git notes don't exist locally.

This happens in:

  • Shallow clones (--depth=N)
  • Partial clones missing commits from other branches
  • Repos after force-push/rebase where old commits are gone

The fix:

  1. Use git cat-file -e to verify each commit exists before sorting
  2. Filter out non-existent commits from the list
  3. Only pass existing commits to git log --no-walk
  4. Handle edge cases (no commits exist, only one exists)

Open with Devin

Fixes orphaned git notes issue where git-ai stats would fail or return
0 AI lines when some commits referenced in git notes don't exist locally.

This happens in:
- Shallow clones (--depth=N)
- Partial clones missing commits from other branches
- Repos after force-push/rebase where old commits are gone

The fix:
1. Use git cat-file -e to verify each commit exists before sorting
2. Filter out non-existent commits from the list
3. Only pass existing commits to git log --no-walk
4. Handle edge cases (no commits exist, only one exists)

Tested with deputydev-spectator repo shallow clone where 2 out of 4
commits with a prompt were missing. After fix:
- Before: 0 prompts loaded, ai_accepted: 0
- After: 1 prompt loaded, ai_accepted: 30 ✅
@CLAassistant
Copy link

CLAassistant commented Jan 17, 2026

CLA assistant check
All committers have signed the CLA.

@satyamtg
Copy link
Author

Fixes #366

@svarlamov svarlamov self-assigned this Jan 17, 2026
@svarlamov
Copy link
Member

Hey @satyamtg first of all, thank you so much for contributing to Git AI! Always makes my day to see issues followed by contributions!

Can you help me understand a bit clearer how all of this is supposed to work and how exactly it addresses the issue in #366

Here's my mental model: since git-ai stats is based on git log, if you don't have the commit (e.g., a shallow clone), it would never try to fetch the authorship note for a commit that doesn't exist. Also, if you were to request stats for a commit that doesn't exist, the command will error out -- I don't think there's any other way to do this since Git doesn't know the commit exists on remote, it just knows that it doesn't have it. What's your thinking on this?

I think we should update our limitations page to more explicitly call this out, but we intentionally have the stats command silently 'fail' (or I should say render data on a best effort basis). This is also how many of git's commands work for shallow clones (for ex, try git blame <file> on a shallow clone, it doesn't warn you that it'll use only local data)

@satyamtg
Copy link
Author

satyamtg commented Jan 17, 2026

Hey @svarlamov , so basically I was just trying out things with git-ai. I made 4 commits on a branch, using a conversation_id (say X) with my AI agent tool. Then I paused, and pushed the 4 commits (say A, B, C and D) to remote. After that, I basically reset the branch to the base branch state, and continuing the same conversation ID X, did 2 commits with AI changes, and a human commit. Then force pushed these 3 commits to the same branch. (Say commits P, Q, R). Once this was done, I just pulled the branch on my other computer (just that branch) + the git notes at ref = ai. I tried running git-ai stats P..R, but it gave 0 AI attributions. But on the original computer, it gave 30 lines of AI code. Note that enough data was present on the 2nd computer where I just wanted to fetch the AI stats originally.

So, in theory, this seems to be a bug to me. I dug deep and found that when you run stats with a range, it tries to get the prompt from AI notes, then the commits linked to that prompt. In this case, A, B, C, D were also linked. What I feel is this is how it runs -

  1. git ai blame on files at commit C to see who authored each line
  2. Discover prompt IDs embedded in the blame output (e.g., ai:prompt-abc-123)
  3. Searches git notes to find which commits have that prompt ID
  4. Gets data from those commits to finally track attribution.

For this, at one point it would run git log --format=%H --date-order --no-walk A B C D P Q
Reason being the conversation ID didn't actually change, it had A B C and D also attached to it.

git log --format=%H --date-order --no-walk 6226137a9d051752bfd1ea17364a8159efbc2c2c
fatal: bad object 6226137a9d051752bfd1ea17364a8159efbc2c2c

Hope this clears the issue. My fix just makes sure that this git log command doesn't fail.

The idea, as you already stated, is to do best effort attribution. But in my case, even that didn't happen.

I've already tested this fix with my exact scenario, and it works fine.

@acunniffe
Copy link
Collaborator

Hey @satyamtg can you possibly provide

  • the contents of the Git Notes you're seeing filtered out eprintln!("[DEBUG] Filtering out non-existent commit: {}", sha);
  • or a test bash script / integration test using the TestRepo struct we have use everywhere?

I think you stumbled on a real + broader set of stats issues where reusing the same prompt AFTER making a commit.

@svarlamov svarlamov assigned acunniffe and unassigned svarlamov Jan 18, 2026
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines +329 to +354
eprintln!("[DEBUG] Found {} commits with pattern '{}' in git notes", sha_vec.len(), pattern);
for sha in &sha_vec {
eprintln!("[DEBUG] - {}", sha);
}

// FILTER OUT NON-EXISTENT COMMITS using git cat-file -e
let existing_shas: Vec<String> = sha_vec
.into_iter()
.filter(|sha| {
let mut args = repo.global_args_for_exec();
args.push("cat-file".to_string());
args.push("-e".to_string());
args.push(sha.clone());

let exists = exec_git(&args).is_ok();
if !exists {
eprintln!("[DEBUG] Filtering out non-existent commit: {}", sha);
}
exists
})
.collect();

eprintln!("[DEBUG] After filtering: {} existing commits", existing_shas.len());
for sha in &existing_shas {
eprintln!("[DEBUG] - {}", sha);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Debug output unconditionally printed to stderr in production

The new code uses raw eprintln! statements to print debug messages, but the rest of the codebase uses debug_log() which only prints when debug mode is enabled via is_debug_enabled().

Root Cause and Impact

The function grep_ai_notes now contains multiple eprintln! calls at lines 329-331, 345, 351-353, and 394 that will always print debug output to stderr, regardless of whether debug mode is enabled.

For comparison, other functions in the same file use debug_log() (lines 277-280, 292) which is conditional:

// From src/utils.rs
pub fn debug_log(msg: &str) {
    if is_debug_enabled() {
        eprintln!("\x1b[1;33m[git-ai]\x1b[0m {}", msg);
    }
}

Impact: In production, users will see debug messages like [DEBUG] Found X commits with pattern 'Y' in git notes printed to stderr every time grep_ai_notes is called with multiple results. This pollutes the terminal output and may confuse users or break scripts that parse stderr.

Prompt for agents
Replace all eprintln! calls in the grep_ai_notes function with debug_log calls to match the existing pattern in the codebase. The debug_log function is already imported at line 5 of src/git/refs.rs.

Specifically, replace:
- Line 329: eprintln!("[DEBUG] Found {} commits...") -> debug_log(&format!("Found {} commits..."))
- Lines 330-332: The for loop with eprintln! -> wrap in debug_log or remove
- Line 345: eprintln!("[DEBUG] Filtering out...") -> debug_log(&format!("Filtering out..."))
- Lines 351-354: eprintln! calls -> debug_log calls
- Line 394: eprintln!("[DEBUG] Single commit...") -> debug_log(&format!("Single commit..."))

Note: The [DEBUG] prefix is not needed when using debug_log since it already adds a [git-ai] prefix.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants