Skip to content

PRODUCT ROADMAP — IMPLEMENTATION ORDER #56

@Tanush1912

Description

@Tanush1912

Product Roadmap — Implementation Order

This issue defines the execution order for all open work across Project Ouroboros. It reflects the current state of the codebase after the #49/#50 integration gap fixes, and prioritizes foundation integrity before new features.


Guiding Principle

Don't build on top of broken state tracking and false-positive tests. Make the system honest about its own state first, then make it runnable, then build new capabilities.

The system currently has a deceptive surface: tests pass, lint passes, workflows compile — but some of those passes are vacuous. Tautological tests, unimplemented lint rules, and missing status updates mean the safety net has holes. Fixing these holes first ensures that every future change gets real feedback, not green checkmarks that mask silent failures.


Dependency Map

#46 ROADMAP (GCP, secrets, local config)
 └── everything depends on this for real agent runs

#51 System integrity gaps (parent)
 ├── #52 Workflow state consistency
 ├── #53 Lint rules + test coverage
 ├── #54 Docs & architecture alignment
 └── #55 Test quality

#47 Docker observability in CI
#48 Agent-written code must produce queryable logs

#39-#44 Feature issues (test quality framework)
 ├── #39 AST-based test quality gate
 ├── #40 Behavioral contracts
 ├── #41 Adversarial test writer agent
 ├── #42 Mutation sampling
 ├── #43 Human test anchors
 └── #44 Reviewer test quality assessment

Phase 1: Fix the Foundation (~3 PRs)

Goal: Make every test, lint check, and workflow state transition trustworthy.

Order Issue Why now
1 #52 — Workflow state consistency The core workflow loop has silent bugs (phantom "merged" status, 4 nodes missing status updates on success, empty dict returns losing observability). Every agent run is affected. If you build on top of broken state tracking, new features inherit the bugs.
2 #53 — Lint rules + test coverage WF-004 and WF-006 are supposed to protect you from guard bypass and tool budget errors. Without them, there's no safety net as you modify workflows. GP-011–014 tests ensure your lint checks don't silently regress. The PerfComparisonPerfComparisonResult catalog mismatch also lives here.
3 #55 — Test quality Tautological tests give false confidence. 4 guard boundary tests verify arithmetic (0 + 50 > 50 == False) instead of calling actual guard functions. Fix these before you start relying on "all tests pass" as a signal that changes are safe. Quick fix — 4 tests rewritten + 2 lint runner integration tests added.

After Phase 1: The test suite, lint rules, and workflow state transitions are all trustworthy. "All tests pass" actually means something.


Phase 2: Make It Runnable (~4 PRs)

Goal: Get agents executing end-to-end — locally and in CI — with full observability.

Order Issue Why now
4 #46 — Roadmap to fully functional system GCP project setup, Vertex AI API, service account, GitHub Actions secrets, local .env config. Nothing runs for real until this is done. It's mostly configuration, not code.
5 #54 — Docs & architecture alignment Align ARCHITECTURE.md with what arch_lint.py actually enforces, make guard limits configurable via env vars. Do this alongside #46 since you'll be touching config anyway. Documents the typed state model as a future design goal.
6 #47 — Docker observability in CI Once secrets are configured (#46), agents can run in CI. But without the observability stack running, query_logs/query_metrics tools waste budget on connection refused errors. Each failed tool call burns against MAX_TOOL_CALLS_PER_NODE=50. This makes CI runs functional.
7 #48 — Agent-written code must produce logs Completes the observability loop. After #47 the pipeline exists in CI; after #48 agents actually produce structured logs into it. This is the last piece before agents can self-debug — querying their own application's runtime behavior.

After Phase 2: Agents can run end-to-end in CI and locally, with full observability. The query_logs → VictoriaLogs → Vector → app pipeline is functional. The system is production-ready for autonomous operation.


Phase 3: Build New Capabilities (~6 PRs)

Goal: Advanced test quality framework — agents that can verify, challenge, and improve their own test suites.

Order Issue Why now
8 #39 — AST-based test quality gate Foundation for all test quality work. Parses test files structurally to verify they contain real assertions, not just smoke tests.
9 #44 — Reviewer test quality assessment Gives the reviewer agent the ability to evaluate test quality during PR review, using the AST gate from #39.
10 #43 — Human test anchors Protected invariant test files that agents cannot modify. Establishes a baseline of human-verified test coverage that agent-generated tests build on top of.
11 #42 — Mutation sampling Verifies test effectiveness by introducing small mutations to source code and checking that tests catch them. Proves tests aren't tautological at the source level.
12 #41 — Adversarial test writer agent An agent that deliberately writes tricky edge cases and failure scenarios. Uses mutation sampling (#42) to verify its tests are meaningful.
13 #40 — Behavioral contracts Planner-emitted, deterministically verified contracts. The capstone — the planner specifies expected behaviors as formal contracts, and the system verifies them automatically.

Dependency chain: #39#44#43#42#41#40

Each issue builds on the previous one. The AST gate (#39) is the foundation; behavioral contracts (#40) are the capstone.

After Phase 3: The system has a complete test quality framework. Agents can write tests, verify their quality, challenge them with mutations, and express behavioral expectations as formal contracts.


Execution Summary

Phase 1: Fix the foundation      #52 → #53 → #55           ~3 PRs
Phase 2: Make it runnable         #46 → #54 → #47 → #48    ~4 PRs
Phase 3: Build new capabilities   #39 → #44 → #43 → #42 → #41 → #40   ~6 PRs

Total: ~13 PRs across 14 issues (some issues may be combined into a single PR where changes overlap).


How to Use This Roadmap

  1. Work top-to-bottom within each phase
  2. Don't start Phase 2 until Phase 1 is merged — the foundation must be solid
  3. Phase 2 items 4 and 5 (ROADMAP TO FULLY FUNCTIONAL SYSTEM #46 and Documentation and architecture alignment: layer order, guard configurability, state typing #54) can be done in parallel since they touch different files
  4. Phase 2 items 6 and 7 (Add Docker observability stack to GitHub Actions workflows #47 and Agent-written code must produce queryable logs for self-debugging #48) are strictly sequential — pipeline must exist before apps can log into it
  5. Phase 3 is strictly sequential — each issue depends on the previous one
  6. Update this issue as items are completed to track progress

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions