feat: persistent knowledge layer — codebase indexing, context assembly, safety rails by alt-del-code · Pull Request #1465 · gsd-build/get-shit-done

alt-del-code · 2026-03-29T09:50:13Z

Summary

Rebase of #1190 onto current main — all merge conflicts resolved, all review feedback addressed.

Codebase indexer with SHA-256 hashing, incremental updates via git-diff, multi-language export/import extraction (JS/TS/Python/Go/Rust)
Smart context assembly with Porter stemming, synonym expansion, confidence scoring, tiered module selection, and hard token budgets per agent type
Dependency graph with forward/reverse import maps, path alias resolution, circular detection, and reverse-impact analysis
Loop guard with SHA-256 dedup, A-B ping-pong pattern detection, and configurable circuit breaker (default 200)
Safety rails with protected-file detection, improvement-type classification, scope checking, and circuit breakers
CLI integration via gsd-tools.cjs knowledge {init,index,deps,context,modules,impact,staleness}
Slash command /gsd:knowledge with full documentation

Review fixes (from #1190)

Busy-wait file locking replaced — removed CPU-spinning fallback
walkDir bounded — MAX_FILES=50,000 cap + symlink skipping
Shadowed fs require removed
Circuit breaker threshold raised to 200 and made configurable via opts.circuitBreakTotal

Issues addressed

[Feature Request] Multi-project support for monorepos with independent apps #108 — Monorepo awareness via module detection and package workspace scanning
The gsd-executor subagent type isn't available in this environment. Let me spawn general-purpose agents instead to execute each plan. #81, plan-phase, new-project, and quick still spawn general-purpose instead of dedicated agent types #859, Opencode Copilot Sub-agents consume extra requests #776 — Agent stuck in loops → loop-guard with ping-pong detection
/gsd:health nearly empties STATE.md - that does not seem to be correct #657 — STATE.md corruption → atomic JSON writes with file locking
feat(auto-mode): continuous phase execution via Stop hook #443 — Foundation for continuous/autonomous improvement loop

Test plan

21 new tests covering all modules (init, indexer, context, deps, loop guard, safety, CLI)
All existing GSD tests still pass (1523/1525 — 2 pre-existing env-specific failures unrelated to this PR)
Circuit breaker test updated to use configurable threshold

…embly, safety rails Integrates a persistent knowledge engine into GSD, ported from ACG's battle-tested modules. This gives every GSD executor agent structured codebase intelligence that survives context resets. New modules (get-shit-done/bin/lib/knowledge/): - init.cjs: Creates .planning/knowledge/ with schema-compliant JSON files - indexer.cjs: Full + incremental codebase indexing with SHA-256 hashing - context.cjs: Relevance-scored context assembly with stemming, synonyms, token budgets - dependencies.cjs: Forward/reverse import maps, circular detection, impact analysis - git-diff.cjs: Change detection since last indexed commit - source-analysis.cjs: Multi-language export/import extraction (JS/TS/Python/Go/Rust) - loop-guard.cjs: SHA-256 dedup + ping-pong detection + circuit breaker - safety.cjs: Protected files, scope checking, improvement classification - utils.cjs: File I/O with locking, hashing (self-contained, zero GSD core deps) CLI: gsd-tools.cjs knowledge {init,index,deps,context,modules,impact,staleness} Command: /gsd:knowledge slash command with documentation Tests: 21 new tests (init, indexing, context, deps, loop guard, safety, CLI) Addresses: gsd-build#108 (monorepo awareness), gsd-build#81 (agent loops), gsd-build#657 (state corruption), gsd-build#443 (continuous improvement)

…t-breaker threshold - utils.cjs: remove CPU-spinning busy-wait fallback in acquireLock; Atomics.wait failure now retries immediately without burning CPU - indexer.cjs: add MAX_FILES=50_000 file count limit and symlink skip in walkDir to prevent hangs on large monorepos and symlink-loop traversal - loop-guard.cjs: raise CIRCUIT_BREAK_TOTAL from 30→200 (30 was too aggressive for normal GSD sessions); make it configurable via opts.circuitBreakTotal - gsd-tools.cjs: remove shadowed `const fs = require('fs')` inside staleness subcommand handler — module-level fs is already in scope Resolves all 4 blocking items from trek-e consolidated review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

trek-e

Adversarial Review -- PR #1465

Summary

Persistent knowledge layer with codebase indexing, smart context assembly, dependency graph, loop guard, and safety rails. 1964 additions, 1 deletion across 14 files (11 new modules, 1 new command, test file, gsd-tools integration). Rebase of #1190.

Merge Status: CONFLICTING

This PR cannot merge in its current state. Merge conflicts must be resolved before any other evaluation matters.

Vision Alignment Analysis

Verdict: Fundamentally misaligned with GSD's architecture.

This is the most consequential PR in the batch and requires careful evaluation.

Violates fresh context per agent. GSD's core insight is that fresh context solves context rot. The knowledge layer is an explicit attempt to persist context across agent sessions -- .planning/knowledge/INDEX.json, DECISIONS.json, PATTERNS.json, CONVENTIONS.json, BUGS.json, ASSUMPTIONS.json. This is a persistent knowledge base that agents consult, which is the opposite of fresh context. If the knowledge base contains stale or wrong information, every agent that reads it inherits that error. This is context rot with persistence.
Massive new subsystem. 11 new modules totaling ~1400 lines of implementation code:
- indexer.cjs (308 lines) -- SHA-256 hashing, file walking, export/import extraction
- context.cjs (329 lines) -- Porter stemming, synonym expansion, confidence scoring
- dependencies.cjs (159 lines) -- import mapping, circular detection
- loop-guard.cjs (84 lines) -- dedup, ping-pong detection, circuit breaker
- safety.cjs (97 lines) -- protected files, improvement classification
- source-analysis.cjs (72 lines) -- multi-language export/import extraction
- utils.cjs (141 lines) -- file locking, token estimation, stop words
- constants.cjs (67 lines) -- ignore dirs, code extensions, language mapping
- git-diff.cjs (60 lines) -- incremental indexing
- init.cjs (102 lines) -- schema-compliant initialization
- index.cjs (45 lines) -- module aggregator
This is not a "thin" addition. This is a database engine bolted onto a file-based system.
Reinvents what the agent already does. The "smart context assembly" with Porter stemming, synonym expansion, and confidence scoring replicates what LLMs do natively when given file content. The LLM IS the context assembler. GSD's approach is to give agents the right files (via Glob/Grep/Read), not to build a secondary retrieval engine that pre-selects context for them.
Introduces file locking. The utils.cjs includes withFileLock for atomic JSON writes. The PR description says "busy-wait file locking replaced," but file locking in a system designed for parallel worktree execution is a concurrency primitive that invites deadlocks. GSD avoids this by having each agent work in its own worktree -- shared mutable state (which the knowledge JSON files are) contradicts that isolation.
Token budgeting per agent type. The context assembly has --agent executor --budget 8000 which pre-limits what information an agent receives. GSD's philosophy is to give agents full context and let them use what they need. Pre-budgeting context based on agent type is a form of information hiding that can cause agents to miss relevant context.
7 new CLI subcommands (knowledge init, index, deps, context, modules, impact, staleness). GSD targets "few commands that just work." This adds an entire knowledge management surface area the user must understand and maintain (index --incremental after changes, staleness checks, etc.).

Correctness Issues

The walkDir function has a MAX_FILES=50,000 cap. For large monorepos, this silently truncates the index. No warning when the cap is hit.
The Porter stemming implementation is a naive port. Stemming errors ("running" -> "run" but "running" as a proper noun stays "running") are not handled.
The synonym groups in context.cjs are hardcoded English-only. The SYNONYM_GROUPS include domain-specific terms ("jwt", "oauth", "postgres") that assume a web development context. A game engine project, embedded systems project, or data science project would get irrelevant synonym expansion.
The circuit breaker default of 200 is arbitrary. No justification for why 200 vs 100 vs 500.
The "2 pre-existing env-specific failures" noted in the test plan need investigation. If these are real failures, they should be resolved, not hand-waved.

Security

walkDir follows filesystem paths. If .planning/knowledge/ contains symlinks to sensitive directories, the indexer would follow them. The symlink skipping in the PR description addresses this for source scanning but not for the knowledge directory itself.
The index stores SHA-256 hashes of file content. If the index file is committed to the repo (it lives in .planning/knowledge/), these hashes leak information about file content even if the files themselves are gitignored.

Verdict: REQUEST CHANGES

This PR introduces a large, complex subsystem that contradicts GSD's core architectural principles (fresh context, thin orchestrators, file-based simplicity). The knowledge layer adds 1400+ lines of retrieval engine code to a system whose design philosophy is that the LLM itself is the retrieval engine. The merge conflicts must be resolved, but the architectural concerns are more fundamental than the conflicts.

If persistent knowledge is genuinely needed, it should be evaluated as a design decision with the maintainer, not implemented as a feature PR. The loop guard and safety rails components have standalone value and could be extracted into a separate, focused PR.

trek-e

Code Review

Verdict: APPROVE

Security

No issues found. execGitSafe and spawnSync calls in utils.cjs and safety.cjs use array-form args (never string interpolation into a shell), so there is no shell injection surface. The withFileLock / atomic-rename pattern in safeJsonWrite is solid. Lock files embed process.pid which is sufficient for local single-machine use.

One minor note: safeJsonRead silently swallows all JSON parse errors and file-read errors under one catch. If a JSON file is corrupted mid-write (power loss between write and rename), callers will silently receive null and may overwrite the file with an empty schema. This is an operational edge case rather than a security issue; the .tmp.<pid> → rename pattern already minimises the window significantly.

Logic / Correctness

Circular dependency detection is incomplete. buildDependencyGraph only detects direct A↔B mutual imports (depth 1):

if (moduleDepsPlain[dep] && moduleDepsPlain[dep].includes(mod)) { … }

A→B→C→A is not caught. This is a known limitation that should be documented; the PR description says "circular detection" without qualification, which sets a higher expectation.

updateIndexIncremental does not rebuild total_exports in the stats block. buildIndex correctly computes total_exports from module data; the incremental path sets it to 0 unconditionally (line 1169). This means listModules and callers that read stats.total_exports will see stale data after any incremental update.

assembleDiffAwareContext boost is not reflected in gist/micro surrogate lists. Only base.relevant_modules is re-sorted after diff-boosting; base.gist_modules and base.micro_modules are derived earlier inside assembleContext before boosting, so they remain ordered by the un-boosted scores. The returned object is internally inconsistent for callers that read multiple arrays.

Ping-pong detection test gap: The test at line 1957–1966 does not actually assert that ping-pong was detected — it only asserts totalCalls >= 6, which is always true. The test is described as "detects ping-pong patterns" but it never verifies the block action.

Test Coverage

21 tests cover the core happy paths well. Gaps:

updateIndexIncremental is tested only for the fallback path. No test exercises the incremental case (full build → file change → incremental update → verify delta).
buildDependencyGraph has no test for circular detection or getImpactedFiles.
assembleDiffAwareContext has no tests at all.
validateMergeReadiness in safety.cjs is exported and fully functional but has no tests.
The ping-pong test mentioned above does not assert the expected outcome (see Logic section).

Style

Overall clean, consistent 'use strict' + CJS throughout. A few minor items:

Long one-liner chains in context.cjs (e.g. the estimate() closure and the budget-enforcement block) would benefit from wrapping for readability, but this is stylistic rather than a defect.
bugsPath is exported from utils.cjs but not used by any module in this PR (it is unused dead export at this point).

Summary

Solid, well-structured feature with meaningful test coverage and all four blocking issues from the previous review addressed. The three logic issues (shallow circular detection, missing incremental total_exports, diff-boosted context array inconsistency) and the non-asserting ping-pong test are the main items to address before merge, but none are blockers for an approve given the PR's scope and the existing mitigations.

trek-e

Code Review Update (Pass 2)

Verdict: REQUEST_CHANGES — no CI results, no response to prior review

Prior Review

The prior review (CHANGES_REQUESTED) flagged issues with the knowledge layer integration. No new commits have been added since that review and no CI checks have run on this branch.

Current Status

No CI results on feat/knowledge-layer-integration branch
No commits addressing prior review feedback visible
The knowledge layer is a large addition (new get-shit-done/bin/lib/knowledge/ module + gsd-tools.cjs routing + command definition) with significant surface area

Outstanding Concern

The knowledge module is imported at the top of gsd-tools.cjs unconditionally (const knowledge = require('./lib/knowledge/index.cjs')). If lib/knowledge/index.cjs or any of its dependencies throws at require-time, it will break every gsd-tools command, not just the knowledge subcommand. The require should either be lazy (inside the case 'knowledge': block) or the module must guarantee no side-effects at load time.

Until CI runs and prior review issues are addressed, this PR cannot merge.

trek-e · 2026-04-04T17:59:33Z

This PR has been open for more than 24 hours without a linked issue. Please link a GitHub issue (e.g., Closes #NNNN) to this PR. If no linked issue is added within 24 hours, this PR will be closed.

Nithish Singh and others added 2 commits March 29, 2026 15:15

alt-del-code requested a review from glittercowboy as a code owner March 29, 2026 09:50

trek-e requested changes Apr 1, 2026

View reviewed changes

trek-e added review: changes requested PR reviewed — changes required before merge enhancement New feature or request area: core PROJECT.md, REQUIREMENTS.md, templates labels Apr 1, 2026

trek-e reviewed Apr 4, 2026

View reviewed changes

trek-e added the needs changes Review requested changes that must be addressed label Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: persistent knowledge layer — codebase indexing, context assembly, safety rails#1465

feat: persistent knowledge layer — codebase indexing, context assembly, safety rails#1465
alt-del-code wants to merge 2 commits intogsd-build:mainfrom
alt-del-code:feat/knowledge-layer-integration

alt-del-code commented Mar 29, 2026

Uh oh!

trek-e left a comment

Uh oh!

trek-e left a comment

Uh oh!

trek-e left a comment

Uh oh!

trek-e commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

alt-del-code commented Mar 29, 2026

Summary

Review fixes (from #1190)

Issues addressed

Test plan

Uh oh!

trek-e left a comment

Choose a reason for hiding this comment

Adversarial Review -- PR #1465

Summary

Merge Status: CONFLICTING

Vision Alignment Analysis

Correctness Issues

Security

Verdict: REQUEST CHANGES

Uh oh!

trek-e left a comment

Choose a reason for hiding this comment

Code Review

Security

Logic / Correctness

Test Coverage

Style

Summary

Uh oh!

trek-e left a comment

Choose a reason for hiding this comment

Code Review Update (Pass 2)

Prior Review

Current Status

Outstanding Concern

Uh oh!

trek-e commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants