Skip to content

Latest commit

 

History

History
217 lines (155 loc) · 10.9 KB

File metadata and controls

217 lines (155 loc) · 10.9 KB

BUILD_PLAN.md — GPT‑GOV (AI‑Native Business OS) — 14‑Day Crash Build

Goal: Stand up a production‑minded MVP of a memory‑anchored, policy‑driven AI governance backend in 14 days, then harden it with guardrails, observability, and learning loops. No code is written here — you’ll fill the functions and schemas yourself. Hidden clues are embedded as HTML comments (<!-- clue: ... -->).

Core sources: Architecture, memory layers, APIs, and jobs are defined in the repo README; treat this plan as the executable checklists to ship the spec.


Guiding Principles

  • Deterministic first, LLM‑assist second.
  • Everything is an event → decisions → outcomes → learning loop.
  • Versioned policies with Autonomy Levels (AL0–AL3) and hash‑chained decisions.
  • Daily snapshots and bandit‑driven variants (promotion via Slack proposal).
  • Small, testable surfaces — every day ends with smoke checks.

Tools You’ll Use (no code here, only actions)

  • Runtime: Bun 1.x, Express + TypeScript, Drizzle ORM, PostgreSQL 15+ with pgvector.
  • AI: Vercel AI SDK with OpenAI provider.
  • Jobs: n8n (HTTP triggers + scheduled).
  • Deploy: Railway.
  • Validation & Logs: zod, pino.

Skill Gaps — What to Learn Fast

  1. Drizzle ORM + migrations: schema → migration → rollback; seeding patterns.
  2. pgvector basics: embedding dims, vector(1536), cosine distance; simple ANN index.
  3. Policy design: autonomy bands; escalation; versioning semantics.
  4. Hash chains: prev_hash → hash to make decisions tamper‑evident.
  5. Bandits (Thompson Sampling): mapping successes from outcomes to (alpha, beta).
  6. n8n flows**: webhook triggers, scheduled runs, secrets handling.
  7. Observability: p50/p95 lat, token usage, AL distribution; JSON logging; correlation IDs.
  8. Security: job token headers, least‑privilege service tokens, PII redaction in logs.

Deliverables Overview

  • Day 3: Local server responding to /health; Postgres + pgvector running; migrations applied.
  • Day 6: /events, /decisions/finance, /outcomes/:decisionId round‑trip works locally.
  • Day 9: n8n ingest + daily jobs calling internal job endpoints.
  • Day 12: Policy variants routed; outcomes update variant stats; Slack proposal message.
  • Day 14: Deployed on Railway; dashboards show metrics; kill‑switch + idempotency in place.

Daily Plan (14 Days)

Day 1 — Repo Skeleton & Env

  • Initialize Bun + TS project structure exactly like src/ layout (config, routes, core, jobs).
  • Create .env from example with placeholders only (no secrets in repo).
  • Write README excerpts to a DEVLOG entry summarizing subsystem responsibilities.
  • Add pino logger with correlation ID middleware (generate UUID per request).
  • Add /health and /metrics route shells (return hardcoded JSON for now).
  • Set up drizzle.config.ts; connect to local Postgres URL.
  • Exit check: bun run --hot src/server.tsGET /health returns { ok: true }.

Day 2 — Database Foundations

  • Provision local Postgres 15; enable extension CREATE EXTENSION vector;.
  • Define Drizzle models for: events, decisions, outcomes, policy_versions, entity_features, knowledge_snapshots, policy_variants_stats.
  • Add indexes listed in the spec.
  • Generate + push migrations; verify tables exist.
  • Exit check: Run a migration rollback → re‑apply. Confirm no drift.

Day 3 — Policy Loader & Evaluator Scaffolds

  • Create core/policy/types.ts to model autonomy bands and escalation.
  • Implement policy/loader.ts to fetch latest by name or by name@version.
  • Stub policy/evaluator.ts with deterministic guardrail checks only (no LLM).
  • Seed one policy in policy_versions (finance v1.0.0) via migration or seed script.
  • Exit check: Unit tests for evaluator boundaries (approve/deny at edges).

Day 4 — Events API & Context Assembler (Skeleton)

  • Implement POST /events to insert immutable events with optional correlationId.
  • Create core/memory/assembler.ts interface returning: latest snapshots + k‑NN similar cases.
  • Stub memory/similar.ts to return empty for now; wire vector column in decisions.
  • Exit check: cURL insert for DiscountRequested returns row ID; list events by type.

Day 5 — Decisions API (Deterministic)

  • Create POST /decisions/finance handler: load policy → assemble context (stub) → evaluate deterministically → store decision row.
  • Implement hash chain: fetch previous decision hash, compute current.
  • Record latency_ms (monotonic timer) and autonomy_level.
  • Exit check: Round‑trip decision stored with policy_version and hash fields.

Day 6 — Outcomes API + Rewards Mapping

  • Implement POST /outcomes/:decisionId to upsert outcome metrics.
  • Create learning/reward.ts mapping outcomes → success/failure for bandits.
  • Add smoke cURLs for events→decision→outcome trip.
  • Exit check: Posting an outcome updates the decision's learning counters in memory.

Day 7 — Embeddings, Indexer & Similar Cases

  • Implement memory/embeddings.ts wrapper (Vercel AI SDK) to create vectors for decisions + snapshots.
  • Backfill context_vec for last N decisions (script).
  • Implement memory/similar.ts: simple cosine similarity query (LIMIT 10).
  • Exit check: Decisions API attaches top‑k similar case IDs in response (for debug only).

Day 8 — Knowledge Snapshots & Features

  • Create jobs/aggregator.ts to compute daily entity_features (SQL first).
  • Create jobs/distill.ts to produce knowledge_snapshots + embeddings.
  • Add /jobs/*/run routes guarded by x-internal-token.
  • Exit check: Manual POSTs run jobs and populate both tables.

Day 9 — n8n Flows (Ingest + Daily Jobs)

  • Build Ingestor flow: HTTP trigger → normalize → POST /events.
  • Schedule 06:00 Aggregator, 06:10 Distill calls; use job token secret.
  • Export flows into flows/ingest.json.
  • Exit check: n8n executions show 2 successful daily runs.

Day 10 — Variant Router & Bandit Stats

  • Implement learning/bandit.ts (variant selection strategy + update rules).
  • Extend Decisions API to attach variant (e.g., finance-constitution@1.1.B).
  • Create policy_variants_stats updater on each outcome.
  • Exit check: Two variants receive traffic; stats (alpha,beta) mutate as outcomes arrive.

Day 11 — Slack Adapter & Promotion Proposals

  • Implement adapters/slack.ts with notify(channel, text) and health check.
  • Add jobs/scorecards.ts to compute variant deltas and propose promotions.
  • Post to Slack when candidate > baseline with minimum samples + margin.
  • Exit check: A Slack message renders a human‑readable promotion summary.

Day 12 — Guardrails, Idempotency & Kill Switches

  • Add Idempotency-Key support to mutating routes.
  • Introduce node_flags (feature gate per node/action) and a global kill switch.
  • Enforce zod validation on inputs/outputs for adapters and routes.
  • Exit check: Simulate a double‑post and a kill‑switch flip; observe safe behavior.

Day 13 — Metrics, Dashboards & Tracing Hooks

  • Flesh out /metrics: p50/p95 latency, error counts by route, token usage, AL distribution.
  • Add basic OpenTelemetry hooks (optional) and structured error fields.
  • Create a simple dashboard (even JSON file rendered) to visualize trends.
  • Exit check: Metrics reflect traffic; latency budget respected; token spikes visible.

Day 14 — Railway Deployment & Production Checklist

  • Deploy API to Railway; attach Postgres plugin; set all ENV vars.
  • Provision n8n (Railway or n8n Cloud) and point cron webhooks to Railway URLs.
  • Run smoke tests against public URL; confirm Slack messages deliver.
  • Enable backups + retention windows; verify least‑privilege tokens.
  • Exit check: ✅ “MVP Ready” — demo events→decisions→outcomes; daily jobs running; Slack proposals live.

Mastery Accelerators (Do Once)

  • Replay tool: CLI to re‑evaluate last N decisions with latest policy (diff reasons/AL).
  • Shadow mode: run a candidate policy in parallel (no‑action) and compare outcomes.
  • Drift alarms: alert when decision distributions or AL levels shift unexpectedly.
  • Redaction layer: centralize PII scrubbing before logs/snapshots.

Test Scripts (No Implementation Code)

Use cURL (fill placeholders yourself):

  • Event → POST /events
  • Decision → POST /decisions/finance
  • Outcome → POST /outcomes/:decisionId

Definition of Done (MVP)

  • Decisions are reproducible (policy version + hash chain).
  • Memory is layered (events, decisions, features, snapshots).
  • Learn loop updates bandit stats; Slack proposal notifies humans.
  • SLOs respected (p50 < 300ms, p95 < 800ms excluding adapters).
  • Security: job token, kill switch, least privilege, backups configured.

Stretch After Day 14

  • ANN indexes for vectors; cached snapshot windows.
  • Adapter gallery (Stripe invoices, Github issues triage).
  • HR node (POST /decisions/hr) with distinct policies & outcomes.
  • Policy authoring UI (admin‑only) with YAML→JSON validation.