Goal: Stand up a production‑minded MVP of a memory‑anchored, policy‑driven AI governance backend in 14 days, then harden it with guardrails, observability, and learning loops. No code is written here — you’ll fill the functions and schemas yourself. Hidden clues are embedded as HTML comments (
<!-- clue: ... -->).
Core sources: Architecture, memory layers, APIs, and jobs are defined in the repo README; treat this plan as the executable checklists to ship the spec.
- Deterministic first, LLM‑assist second.
- Everything is an event → decisions → outcomes → learning loop.
- Versioned policies with Autonomy Levels (AL0–AL3) and hash‑chained decisions.
- Daily snapshots and bandit‑driven variants (promotion via Slack proposal).
- Small, testable surfaces — every day ends with smoke checks.
- Runtime: Bun 1.x, Express + TypeScript, Drizzle ORM, PostgreSQL 15+ with pgvector.
- AI: Vercel AI SDK with OpenAI provider.
- Jobs: n8n (HTTP triggers + scheduled).
- Deploy: Railway.
- Validation & Logs: zod, pino.
- Drizzle ORM + migrations: schema → migration → rollback; seeding patterns.
- pgvector basics: embedding dims,
vector(1536), cosine distance; simple ANN index. - Policy design: autonomy bands; escalation; versioning semantics.
- Hash chains:
prev_hash → hashto make decisions tamper‑evident. - Bandits (Thompson Sampling): mapping successes from
outcomesto(alpha, beta). - n8n flows**: webhook triggers, scheduled runs, secrets handling.
- Observability: p50/p95 lat, token usage, AL distribution; JSON logging; correlation IDs.
- Security: job token headers, least‑privilege service tokens, PII redaction in logs.
- Day 3: Local server responding to
/health; Postgres + pgvector running; migrations applied. - Day 6:
/events,/decisions/finance,/outcomes/:decisionIdround‑trip works locally. - Day 9: n8n ingest + daily jobs calling internal job endpoints.
- Day 12: Policy variants routed; outcomes update variant stats; Slack proposal message.
- Day 14: Deployed on Railway; dashboards show metrics; kill‑switch + idempotency in place.
- Initialize Bun + TS project structure exactly like
src/layout (config, routes, core, jobs). - Create
.envfrom example with placeholders only (no secrets in repo). - Write README excerpts to a DEVLOG entry summarizing subsystem responsibilities.
- Add pino logger with correlation ID middleware (generate UUID per request).
- Add
/healthand/metricsroute shells (return hardcoded JSON for now). - Set up drizzle.config.ts; connect to local Postgres URL.
- Exit check:
bun run --hot src/server.ts→GET /healthreturns{ ok: true }.
- Provision local Postgres 15; enable extension
CREATE EXTENSION vector;. - Define Drizzle models for:
events,decisions,outcomes,policy_versions,entity_features,knowledge_snapshots,policy_variants_stats. - Add indexes listed in the spec.
- Generate + push migrations; verify tables exist.
- Exit check: Run a migration rollback → re‑apply. Confirm no drift.
- Create
core/policy/types.tsto model autonomy bands and escalation. - Implement
policy/loader.tsto fetch latest by name or byname@version. - Stub
policy/evaluator.tswith deterministic guardrail checks only (no LLM). - Seed one policy in
policy_versions(finance v1.0.0) via migration or seed script. - Exit check: Unit tests for evaluator boundaries (approve/deny at edges).
- Implement
POST /eventsto insert immutable events with optionalcorrelationId. - Create
core/memory/assembler.tsinterface returning: latest snapshots + k‑NN similar cases. - Stub
memory/similar.tsto return empty for now; wire vector column indecisions. - Exit check: cURL insert for
DiscountRequestedreturns row ID; list events by type.
- Create
POST /decisions/financehandler: load policy → assemble context (stub) → evaluate deterministically → store decision row. - Implement hash chain: fetch previous decision hash, compute current.
- Record latency_ms (monotonic timer) and autonomy_level.
- Exit check: Round‑trip decision stored with
policy_versionandhashfields.
- Implement
POST /outcomes/:decisionIdto upsert outcome metrics. - Create
learning/reward.tsmapping outcomes → success/failure for bandits. - Add smoke cURLs for events→decision→outcome trip.
- Exit check: Posting an outcome updates the decision's learning counters in memory.
- Implement
memory/embeddings.tswrapper (Vercel AI SDK) to create vectors for decisions + snapshots. - Backfill
context_vecfor last N decisions (script). - Implement
memory/similar.ts: simple cosine similarity query (LIMIT 10). - Exit check: Decisions API attaches top‑k similar case IDs in response (for debug only).
- Create
jobs/aggregator.tsto compute dailyentity_features(SQL first). - Create
jobs/distill.tsto produceknowledge_snapshots+ embeddings. - Add
/jobs/*/runroutes guarded byx-internal-token. - Exit check: Manual POSTs run jobs and populate both tables.
- Build Ingestor flow: HTTP trigger → normalize → POST
/events. - Schedule 06:00 Aggregator, 06:10 Distill calls; use job token secret.
- Export flows into
flows/ingest.json. - Exit check: n8n executions show 2 successful daily runs.
- Implement
learning/bandit.ts(variant selection strategy + update rules). - Extend Decisions API to attach variant (e.g.,
finance-constitution@1.1.B). - Create
policy_variants_statsupdater on each outcome. - Exit check: Two variants receive traffic; stats
(alpha,beta)mutate as outcomes arrive.
- Implement
adapters/slack.tswithnotify(channel, text)and health check. - Add
jobs/scorecards.tsto compute variant deltas and propose promotions. - Post to Slack when candidate > baseline with minimum samples + margin.
- Exit check: A Slack message renders a human‑readable promotion summary.
- Add
Idempotency-Keysupport to mutating routes. - Introduce
node_flags(feature gate per node/action) and a global kill switch. - Enforce zod validation on inputs/outputs for adapters and routes.
- Exit check: Simulate a double‑post and a kill‑switch flip; observe safe behavior.
- Flesh out
/metrics: p50/p95 latency, error counts by route, token usage, AL distribution. - Add basic OpenTelemetry hooks (optional) and structured error fields.
- Create a simple dashboard (even JSON file rendered) to visualize trends.
- Exit check: Metrics reflect traffic; latency budget respected; token spikes visible.
- Deploy API to Railway; attach Postgres plugin; set all ENV vars.
- Provision n8n (Railway or n8n Cloud) and point cron webhooks to Railway URLs.
- Run smoke tests against public URL; confirm Slack messages deliver.
- Enable backups + retention windows; verify least‑privilege tokens.
- Exit check: ✅ “MVP Ready” — demo events→decisions→outcomes; daily jobs running; Slack proposals live.
- Replay tool: CLI to re‑evaluate last N decisions with latest policy (diff reasons/AL).
- Shadow mode: run a candidate policy in parallel (no‑action) and compare outcomes.
- Drift alarms: alert when decision distributions or AL levels shift unexpectedly.
- Redaction layer: centralize PII scrubbing before logs/snapshots.
Use cURL (fill placeholders yourself):
- Event →
POST /events - Decision →
POST /decisions/finance - Outcome →
POST /outcomes/:decisionId
- Decisions are reproducible (policy version + hash chain).
- Memory is layered (events, decisions, features, snapshots).
- Learn loop updates bandit stats; Slack proposal notifies humans.
- SLOs respected (p50 < 300ms, p95 < 800ms excluding adapters).
- Security: job token, kill switch, least privilege, backups configured.
- ANN indexes for vectors; cached snapshot windows.
- Adapter gallery (Stripe invoices, Github issues triage).
- HR node (
POST /decisions/hr) with distinct policies & outcomes. - Policy authoring UI (admin‑only) with YAML→JSON validation.