Research Lab

An autonomous, compounding research platform built on Claude Code. Submit a thesis, and the lab evaluates it through a multi-expert panel, adversarial stress-testing, and iterative refinement — all saved to a persistent knowledge base that grows smarter over time.

Built by id8Labs

Documentation: The Triad

This project uses the Triad instead of PRDs. Three living documents that replace the traditional product requirements document:

Document	Tense	What It Answers
VISION.md	Future	Where is this going? What does "done" look like?
SPEC.md	Present	What exists today? What works, what doesn't?
BUILDING.md	Past	How did we get here? Every decision and why.

The delta between VISION and SPEC is the roadmap. You never need a separate roadmap document — just diff the two files. Any two documents reconstruct the third.

PRDs go stale the moment code ships. The Triad stays alive because each document has a clear owner: BUILDING grows with every release, SPEC updates with every capability change, VISION evolves as understanding deepens.

The Problem

Research sessions die when the context window closes. Findings live in chat transcripts nobody re-reads. There's no accumulation — each session starts from zero.

The Solution

The Research Lab inverts the relationship between conversation and knowledge. The knowledge base is the context, not the conversation. Sessions are disposable; the filesystem is permanent. Every intermediate result is saved to disk immediately. Every finding cross-references what came before. Every adversarial attack hardens what survives.

How It Works

thesis --> queue --> [3 expert subagents] --> synthesis --> knowledge base
                         |                      |
                    adversarial            convergence
                    framework               metric
                         |                      |
                    naysayer loop <-------- refine loop

The Pipeline

Submit a thesis or question
The lab cross-references it against everything it already knows
3 domain experts evaluate it in parallel (spawned as subagents)
A synthesis identifies convergence, divergence, and genuine tensions
Findings that score high enough get promoted to the knowledge base
Follow-up questions are auto-queued for the next cycle
The convergence score updates — one number tracking if the KB is getting smarter

The Refinement Loop

Inspired by Karpathy's autoresearch: instead of only generating new findings, the lab iterates on existing ones. The REFINE action diagnoses a finding's weaknesses, generates a strengthened version, re-evaluates it, and tracks the delta.

This is how the lab gets closer to truth — not by producing more, but by making what exists stronger.

Architecture

research-lab/
  config.md              # Active module, settings, convergence formula
  RESEARCH-LOG.md        # Rolling log of all research activity

  queue/                 # Briefs waiting to be processed
  active/                # Currently processing (max 1)
  sessions/              # Completed research cycles (immutable)
    YYYY-MM-DD-NNN/
      brief.md           # The thesis evaluated
      context.md         # KB snapshot at time of evaluation
      expert-{1,2,3}.json  # Raw expert assessments (saved immediately)
      synthesis.md       # Triangulated findings
      artifact.html      # Interactive report

  knowledge/             # The persistent brain
    findings.md          # Accumulated findings (F-NNN) — append-only
    open-questions.md    # Auto-generated research questions (OQ-NNN)
    contradictions.md    # Where findings conflict (C-NNN) — most valuable output
    convergence.md       # Single-number KB health metric (0-100)
    thesis-registry.md   # Every thesis ever evaluated
    sources.md           # External source registry
    external/            # Full external source records (EXT-NNN)
    by-topic/            # Deep dives by domain

  modules/               # Swappable evaluation lenses
    research/            # Consciousness & philosophy (12 experts + 8 naysayers)
    clinical/            # Clinical safety (parallax-assess pattern)
    engineering/         # Technical architecture (dev-assess pattern)
    trading/             # Investment-grade trading evaluation
    realestate/          # Real estate deal evaluation

  wings/                 # Research domains with dedicated resources
    caf/                 # Consciousness as Filesystem theory
    parallax/            # Applied AI companion research
    deepstack/           # Trading strategy validation

  state/
    lab-state.json       # Counters, convergence history, module state

Commands

The lab is operated through a Claude Code slash command (/research):

Command	What It Does
`/research "thesis"`	Submit a new thesis for evaluation
`/research run`	Process the next brief in the queue
`/research run --all`	Process the entire queue
`/research refine F-NNN`	Iteratively improve a specific finding
`/research refine --auto`	Auto-select and refine the weakest finding
`/research naysayer`	Adversarial loop — 3 critics attack the weakest findings
`/research intake <file>`	Absorb your completed work into the KB
`/research ingest <source>`	Absorb external research (file, URL, or quoted text)
`/research convergence`	Display the KB health metric
`/research status`	Dashboard with queue, findings, convergence
`/research compound`	Consolidation pass across all findings
`/research review <id>`	Re-read a past session's findings

Modular Lens System

The same pipeline works for different domains by swapping the module — the expert pool, rubric, adversarial framework, and synthesis prompt change, but the architecture stays constant.

Module	Expert Pool	Rating System	Use Case
`research`	12 experts + 8 naysayer critics	A+/A/A-.../F weighted rubric	Theoretical research
`clinical`	Clinical safety panel	SAFE/CAUTION/CONCERN/CRITICAL	Safety evaluation
`engineering`	Technical architecture panel	A+/A/A-.../F engineering rubric	Architecture review
`trading`	12-member investment board	Investment-grade scoring	Trading strategy validation
`realestate`	Real estate panel	Deal-grade scoring	Real estate evaluation

Each module contains 7 files: experts.md, rubric.md, schema.md, adversarial.md, synthesis-prompt.md, intake-prompt.md, ingest-prompt.md.

Key Design Decisions

Conservative grading. Median composite, not mean. Any CRITICAL dimension caps the grade at C. This prevents a strong thesis from hiding a fatal flaw.

Append-only findings. findings.md is never overwritten, never deleted. Findings can be refined, retired, or split — but the history is always preserved.

Anti-session-death. Every phase saves to disk before proceeding. Crash at any point = zero data loss. The knowledge base is always recoverable.

Intellectual honesty. When ingesting external research, if the external source has a stronger argument than an internal finding, the lab says so. No defensiveness.

Adversarial evolution. The adversarial framework grows over time. Each naysayer loop and each external source can add new challenges. The bar only goes up.

Iterative refinement. Inspired by autoresearch. Instead of only generating new findings, the lab strengthens existing ones through diagnose-strengthen-reevaluate cycles, tracking the delta at each iteration.

Convergence metric. A single 0-100 score tracking overall KB health: finding maturity (30%), contradiction resolution (25%), question coverage (15%), adversarial survival (20%), and refinement depth (10%).

Current State

Convergence: 12.5 / 100 (baseline — post-naysayer devastation, pre-refinement)
Findings: F-000 through F-005 (6 findings, most damaged by naysayer loop)
Open Questions: 30
Contradictions: 5 (all OPEN)
Adversarial Challenges: 21 (5 CRITICAL, 6 HIGH)
Modules: 5 complete
Wings: 3 (CaF, Parallax, DeepStack)

Requirements

Claude Code (CLI)
The /research slash command installed in ~/.claude/commands/research.md

This is a Claude Code native tool — it runs as a slash command within the Claude Code CLI, using subagents for parallel expert evaluation.

License

MIT

Built with Claude Code by id8Labs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Lab

Documentation: The Triad

The Problem

The Solution

How It Works

The Pipeline

The Refinement Loop

Architecture

Commands

Modular Lens System

Key Design Decisions

Current State

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
knowledge		knowledge
modules		modules
queue		queue
sessions		sessions
state		state
wings		wings
.gitignore		.gitignore
BUILDING.md		BUILDING.md
CLAUDE.md		CLAUDE.md
README.md		README.md
RESEARCH-LOG.md		RESEARCH-LOG.md
SPEC.md		SPEC.md
VISION.md		VISION.md
config.md		config.md

Folders and files

Latest commit

History

Repository files navigation

Research Lab

Documentation: The Triad

The Problem

The Solution

How It Works

The Pipeline

The Refinement Loop

Architecture

Commands

Modular Lens System

Key Design Decisions

Current State

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages