Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 88 additions & 149 deletions .planning/codebase/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -1,183 +1,122 @@
# Architecture

**Analysis Date:** 2026-01-23
**Analysis Date:** 2026-02-09

## Pattern Overview

**Overall:** Layered pipeline architecture with stage-based processing and LLM integration.
**Overall:** Deterministic, schema-validated stage pipeline orchestrated by a CLI with pluggable LLM runtimes.

**Key Characteristics:**
- Sequential phase/stage execution with context passing
- Schema-driven validation at stage boundaries
- Pluggable LLM runtime providers (OpenAI, Copilot)
- Configuration-driven task scheduling and action execution
- Repository analysis and automated maintenance workflows
- Fixed stage sequence with typed I/O enforced by JSON schema (`packages/core/src/stage.ts`, `packages/core/src/runner.ts`, `packages/schemas/stages/*.json`)
- CLI-driven orchestration with report/output generation (`packages/cli/src/index.ts`)
- Runtime adapters for LLM providers separated from orchestration (`packages/runtimes/openai/src/index.ts`, `packages/runtimes/copilot/src/index.ts`)

## Layers

**CLI Layer:**
- Purpose: Command-line interface and entry point for all workflows
- Location: `packages/cli/src/index.ts`, `src/cli.ts`
- Contains: Command definitions, stage orchestration, runtime initialization
- Depends on: Core layer, Runtime providers
- Used by: End users via CLI commands
**CLI Orchestration:**
- Purpose: Load config, enforce scope/budgets, build stages, run pipeline, write reports/publish payloads.
- Location: `packages/cli/src/index.ts`
- Contains: Command parsing, stage assembly, prompt/schema loading, output writing.
- Depends on: Core library, runtimes, prompt files, schema files.
- Used by: CLI executable (`package.json` -> `dist/cli/src/index.js`) and GitHub Action (`action.yml`).

**Core Layer:**
- Purpose: Core domain logic for repository analysis and workflow execution
**Core Library:**
- Purpose: Provide shared primitives for config, scope, budgets, schema validation, and stage execution.
- Location: `packages/core/src/`
- Contains: Stage definitions, schema validation, config loading, repository fact collection, prompt management
- Depends on: None (no external dependencies)
- Used by: CLI layer, phase implementations

**Phase Layer:**
- Purpose: Specific maintenance workflow steps
- Location: `src/phases/`
- Contains: Scan, Analyze, Plan, Propose, Act phases
- Depends on: Core types, configuration
- Used by: Groundkeeper orchestrator

**Runtime Layer:**
- Purpose: LLM provider abstraction
- Location: `packages/runtimes/openai/src/`, `packages/runtimes/copilot/src/`
- Contains: OpenAI and Copilot SDK integrations
- Depends on: External APIs (OpenAI, Copilot)
- Used by: CLI stage builders

**Configuration Layer:**
- Purpose: Configuration loading and validation
- Location: `src/config.ts`, `packages/core/src/config.ts`
- Contains: YAML parsing, schema validation, defaults merging
- Depends on: js-yaml library
- Used by: Groundkeeper, CLI runner
- Contains: `config.ts`, `scope.ts`, `budgets.ts`, `schema.ts`, `runner.ts`, `stage.ts`, `repo.ts`, `prompt.ts`.
- Depends on: Node FS/path and JSON/YAML parsing (`packages/core/src/config.ts`, `packages/core/src/schema.ts`).
- Used by: CLI (`packages/cli/src/index.ts`).

**Runtime Adapters:**
- Purpose: Implement provider-specific `generateJson` behavior for LLM calls.
- Location: `packages/runtimes/*/src/index.ts`
- Contains: OpenAI fetch-based adapter and Copilot SDK adapter.
- Depends on: Provider SDKs or APIs (`packages/runtimes/openai/src/index.ts`, `packages/runtimes/copilot/src/index.ts`).
- Used by: CLI runtime selection (`packages/cli/src/index.ts`).

**Schemas:**
- Purpose: Define and validate stage input/output contracts and agent config schema.
- Location: `packages/schemas/`
- Contains: `agent.schema.json`, `stages/*.schema.json`.
- Depends on: JSON schema validation in `packages/core/src/schema.ts`.
- Used by: CLI stage setup and validation in `packages/cli/src/index.ts` and `packages/core/src/runner.ts`.

**Prompts:**
- Purpose: Stage-specific prompt templates for LLM interaction.
- Location: `prompts/`
- Contains: `system.md`, `analysis.md`, `plan.md`, `patch.md`, `verify.md`, `publish.md`.
- Depends on: Prompt loader in `packages/core/src/prompt.ts`.
- Used by: CLI stage execution in `packages/cli/src/index.ts`.

**GitHub Action Wrapper:**
- Purpose: Provide a composite action that runs the CLI and publishes reports.
- Location: `action.yml`
- Contains: Setup, install/build, CLI invocation, optional issue publishing.
- Depends on: Built CLI (`dist/cli/src/index.js`) or global `groundkeeper` binary.
- Used by: GitHub Actions workflows (e.g., `action.yml`).

## Data Flow

**Main Workflow (CLI):**

1. Load agent configuration (`packages/core/src/config.ts`)
2. Collect repository facts (`packages/core/src/repo.ts`)
3. Validate scope and budgets (`packages/core/src/scope.ts`, `packages/core/src/budgets.ts`)
4. Build stage pipeline (`packages/cli/src/index.ts` - `buildStages()`)
5. Execute stages sequentially with:
- Input validation via schema (`packages/core/src/schema.ts`)
- Stage-specific logic (preflight, analysis, plan, patch, verify, publish)
- Output validation via schema
- Log persistence to `.groundkeeper/logs/`
6. Generate report and optional publish payload

**Phase Workflow (Legacy/Alternative):**

1. Load Groundkeeper config (`src/config.ts`)
2. Initialize Groundkeeper context (`src/groundkeeper.ts`)
3. Execute phases sequentially:
- **Scan**: Repository structure and metadata collection (`src/phases/ScanPhase.ts`)
- **Analyze**: Finding generation from scan results (`src/phases/AnalyzePhase.ts`)
- **Plan**: Action planning from findings (`src/phases/PlanPhase.ts`)
- **Propose**: Concrete proposal generation (`src/phases/ProposePhase.ts`)
- **Act**: Output generation and optional execution (`src/phases/ActPhase.ts`)
4. Stop on first failure; continue on success
**CLI Run Pipeline:**

**State Management:**
1. Load and validate agent config (`packages/core/src/config.ts`) from `.github/agent.yml`.
2. Collect repo facts (`packages/core/src/repo.ts`) and file list (`packages/cli/src/index.ts`).
3. Enforce scope and budgets (`packages/core/src/scope.ts`, `packages/core/src/budgets.ts`).
4. Build stage definitions with prompt/templates and schemas (`packages/cli/src/index.ts`, `prompts/*.md`, `packages/schemas/stages/*.json`).
5. Run stages sequentially with schema validation and logging (`packages/core/src/runner.ts`).
6. Write report and optional publish payload to `.groundkeeper/` (`packages/cli/src/index.ts`).

- Context object passed through all phases/stages: `GroundkeeperContext` (`src/types.ts`)
- Each phase adds results to `context.results` dictionary
- Stage results accumulated in array and logged
- No global state; all data flows through parameters
**State Management:**
- Stage data is passed via `StageContext` and stage outputs chained as the next input (`packages/core/src/stage.ts`, `packages/core/src/runner.ts`).

## Key Abstractions

**Phase:**
- Purpose: Base class for workflow steps
- Examples: `src/phases/ScanPhase.ts`, `src/phases/AnalyzePhase.ts`, `src/phases/PlanPhase.ts`
- Pattern: Template method with `execute()` (logging/timing) and abstract `run()` (logic)

**StageDefinition:**
- Purpose: Typed stage with input/output schemas
- Location: `packages/core/src/stage.ts`
- Pattern: Generic type with `name`, schema paths, async `run()` function

**Finding:**
- Purpose: Represents a discovery from analysis
- Location: `src/types.ts`
- Fields: id, type, severity, message, file, suggestion
- Used in: Analysis and planning phases

**ProposedChange:**
- Purpose: Represents actionable change (create, update, delete)
- Location: `src/types.ts`
- Fields: type, target, content, reason
- Used in: Propose and Act phases

**LlmRuntime:**
- Purpose: Abstract LLM provider interface
- Pattern: `{ generateJson(messages: LlmMessage[]): Promise<unknown> }`
- Implementations: `packages/runtimes/openai/src/index.ts`, `packages/runtimes/copilot/src/index.ts`
**StageDefinition/StageContext/StageResult:**
- Purpose: Standardize stage execution contract.
- Examples: `packages/core/src/stage.ts`, `packages/core/src/runner.ts`, `packages/cli/src/index.ts`.
- Pattern: Stage pipeline with validated I/O and fail-fast behavior.

**AgentConfig:**
- Purpose: Configuration contract for scope, budgets, runtime, publish settings.
- Examples: `packages/core/src/config.ts`, `.github/agent.yml`.
- Pattern: Load/validate YAML then enforce constraints before running stages.

**PromptPayload:**
- Purpose: Normalize system/stage/schema/context payload passed to LLM.
- Examples: `packages/core/src/prompt.ts`, `packages/cli/src/index.ts`.
- Pattern: Build + serialize JSON payload for provider runtimes.

**Runtime Adapters:**
- Purpose: Provider-specific JSON generation from prompts.
- Examples: `packages/runtimes/openai/src/index.ts`, `packages/runtimes/copilot/src/index.ts`.
- Pattern: `generateJson(messages)` returns parsed JSON or throws errors.

## Entry Points

**CLI Command: `groundkeeper run`:**
- Location: `packages/cli/src/index.ts` (main), `src/cli.ts` (legacy)
- Triggers: User executes `npm start` or `groundkeeper run` command
- Responsibilities:
- Parse command-line options (config, directory)
- Load configuration and validate
- Collect repository facts
- Build and execute stage pipeline
- Generate report output
- Write publish payload if configured

**CLI Command: `groundkeeper init`:**
- Location: `src/cli.ts`
- Triggers: User runs `groundkeeper init`
- Responsibilities: Generate default `groundkeeper.yml` config template

**CLI Command: `groundkeeper scan`:**
- Location: `src/cli.ts`
- Triggers: User runs `groundkeeper scan`
- Responsibilities: Execute only scan phase

**CLI Command: `groundkeeper analyze`:**
- Location: `src/cli.ts`
- Triggers: User runs `groundkeeper analyze`
- Responsibilities: Execute scan and analyze phases

**Groundkeeper.run():**
- Location: `src/groundkeeper.ts`
- Entry for programmatic use
- Executes all phases in sequence
**CLI Binary:**
- Location: `packages/cli/src/index.ts`
- Triggers: `groundkeeper run` via `package.json` bin (`dist/cli/src/index.js`).
- Responsibilities: Parse args, load config, run stage pipeline, write outputs.

**GitHub Action:**
- Location: `action.yml`
- Triggers: GitHub Actions workflows.
- Responsibilities: Install/build CLI, run it, publish issue payload, upload artifacts.

## Error Handling

**Strategy:** Fail-fast on validation or phase failure; continue on info-level findings.
**Strategy:** Fail fast on schema or runtime errors, return structured stage results.

**Patterns:**

- **Schema Validation**: `validateSchema()` in `packages/core/src/schema.ts` returns error array; stage stops if errors present
- **Phase Execution**: `Phase.execute()` wraps `run()` with try-catch; logs error and returns failed result
- **Config Validation**: `ConfigLoader.validateConfig()` throws on invalid format or missing required fields
- **Pipeline Stopping**: `Groundkeeper.executePhases()` breaks loop on first failed phase result
- **Stage Dependency**: `runStages()` breaks loop on validation failure or execution error
- Input/output validation errors stop the pipeline with `success: false` (`packages/core/src/runner.ts`).
- Config validation throws with aggregated error list (`packages/core/src/config.ts`).

## Cross-Cutting Concerns

**Logging:**
- Approach: Console-based logging with phase/stage prefixes
- Pattern: `console.log()` with `[${phaseName}]` or `[${stageName}]` prefix
- Timing: Phase base class logs start/completion with duration
- Files: Stage logs written to `.groundkeeper/logs/{stageName}.input.json` and `.groundkeeper/logs/{stageName}.output.json`

**Validation:**
- Approach: JSON schema validation at stage boundaries
- Location: `packages/core/src/schema.ts` - `validateSchema()` recursive validator
- Applied to: Stage input/output before/after execution
- Configuration: YAML schema validation via regex patterns

**Authentication:**
- Approach: Environment variable-based token management
- Tokens used: `GITHUB_TOKEN`, `OPENAI_API_KEY` (OpenAI runtime), Copilot SDK tokens
- Read at: CLI initialization in `packages/cli/src/index.ts`
- Scope checking: `packages/core/src/scope.ts` enforces allowed file patterns
**Logging:** Stage start logs and JSON input/output logs in `.groundkeeper/logs` (`packages/core/src/runner.ts`).
**Validation:** JSON schema validation for stages (`packages/core/src/schema.ts`) and manual config validation (`packages/core/src/config.ts`).
**Authentication:** Provider credentials via env vars (`packages/cli/src/index.ts`) and GitHub token injection in action runtime (`action.yml`).

---

*Architecture analysis: 2026-01-23*
*Architecture analysis: 2026-02-09*
Loading