Skip to content

fix(indexer): use negative mtime sentinel for gitignored files#40

Merged
A-Souhei merged 2 commits intomainfrom
fix/indexer-mtime-cache-gitignored
Mar 13, 2026
Merged

fix(indexer): use negative mtime sentinel for gitignored files#40
A-Souhei merged 2 commits intomainfrom
fix/indexer-mtime-cache-gitignored

Conversation

@A-Souhei
Copy link
Owner

Summary

  • Fixes the re-index regression introduced by PR feat: enforce gitignore protection across bash tool and indexer #39 where gitignored files returned null from indexFile, causing them to never be stored in the mtime cache. On every restart, isGitignored() was re-run for every gitignored file in the project (potentially thousands in node_modules etc.).

  • Uses a negative mtime sentinel (-stat.mtimeMs) instead of null for gitignored files. The mtime early-return check (=== stat.mtimeMs, always positive) can never match a negative value, so the gitignore check always re-runs — meaning un-gitignoring a file without touching its mtime still correctly triggers re-indexing on the next run.

  • No file content is read and no vectors are stored for gitignored files. The gitignore safety property from PR feat: enforce gitignore protection across bash tool and indexer #39 is fully preserved.

Changes

packages/opencode/src/indexer/index.ts

  • indexFile line 745: if (ignored) return -stat.mtimeMs (was return null)
  • Updated JSDoc on indexFile to document the negative sentinel semantics
  • Added sentinel convention comment block before loadMtimeCacheFromDisk
  • Updated processBatch comment to correctly document all three return cases (positive / negative / null)

Notes

Investigation revealed the "3,697 files indexed on every restart" symptom is the total counter (all files the glob scanner finds), not the number actually being re-embedded. The mtime cache does skip already-indexed files correctly. The remaining ~1,955 files not in cache are ones whose embedding batches never completed (indexer interrupted before finishing). That is a separate issue.

Gitignored files now return -stat.mtimeMs instead of null so callers
store a sentinel in the mtime cache. This prevents re-running the
gitignore check on every startup for every gitignored file (fixing the
re-index-from-zero regression introduced in PR #39), while ensuring
the mtime early-return never fires for gitignored files — so
un-gitignoring a file without touching its mtime still triggers a
proper re-index on the next run.

No file content is read and no vectors are stored for gitignored files;
the gitignore safety property from PR #39 is fully preserved.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an indexer restart regression where gitignored files weren’t persisted in the mtime cache, causing repeated gitignore checks across restarts. The PR introduces a negative-mtime sentinel to persist “gitignored” state without ever allowing the unchanged-mtime fast path to bypass re-checking ignore status.

Changes:

  • Persist gitignored files in the mtime cache using a negative mtime sentinel instead of returning null.
  • Document the mtime-cache sentinel conventions and the three indexFile return cases.
  • Update batch processing comments to reflect the new sentinel semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +746 to +748
// Check if gitignored BEFORE reading the file
const ignored = isIgnored ? isIgnored(filePath) : await isGitignored(filePath)
if (ignored) return null
if (ignored) return -stat.mtimeMs
}

/** Returns the file's mtimeMs if it was indexed, null if skipped or errored. */
/** Returns the file's mtimeMs if indexed, -mtimeMs (negative) if gitignored (cached as sentinel so callers skip re-stat but always re-check gitignore), null if skipped for other reasons or errored. */
Comment on lines +170 to +171
// null (absent key) = never seen. Negative sentinels are persisted to disk so the
// mtime early-return never fires for gitignored files across restarts.
// null — error or file otherwise unindexable (do not update cache)
if (mtime !== null) {
cache.set(file, mtime)
cache.set(file, mtime) // <-- direct Map.set on s.mtimeCache ref
// Check if gitignored BEFORE reading the file
const ignored = isIgnored ? isIgnored(filePath) : await isGitignored(filePath)
if (ignored) return null
if (ignored) return -stat.mtimeMs
- Fix -0 sentinel: use -(mtimeMs+1) instead of -mtimeMs so mtime 0
  never produces -0 (which === 0 in JS and would defeat the invariant)
- Check gitignore before mtime early-return in indexFile() so files
  newly added to .gitignore are pruned even without mtime change
- Thread AbortSignal through redis.getAllMtimes() to allow cancellation
  of slow-path Redis scroll on abort
- Change buildIgnoreChecker catch fallback to () => true (conservative,
  consistent with isGitignored utility) to prevent accidental indexing
  of sensitive files when git check-ignore fails
- Fix indexFile JSDoc: sentinel skips content reads/embedding, not re-stat
- Clarify cache semantics comment: absent key = unindexable/errored,
  not strictly "never seen"
- Replace debug-style cache.set comment with intent explanation
- Update docs/CLI_INDEX_DELETE.md: opencode index delete → indexer delete
- Add mtime sentinel unit tests covering: always-negative invariant,
  never-equals-positive-mtime, -0 JSON round-trip hazard, persistence
@A-Souhei A-Souhei merged commit ab8c27a into main Mar 13, 2026
2 checks passed
@A-Souhei A-Souhei deleted the fix/indexer-mtime-cache-gitignored branch March 13, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants