Cortex Strategic Roadmap

Created: 2026-03-12 Author: Jesse Kemp + Claude Horizon: 6 months (Mar — Sep 2026) Review: Bi-weekly

Part 1: Paths Forward (Goal-Aligned Assessment)

Current Position (Honest)

┌─────────────────────────────────────────────────────────────────┐
│ CORTEX COMPETITIVE POSITION — MARCH 2026                        │
├────────────────┬──────────┬──────────┬──────────┬──────────────┤
│                │ Cortex   │ Mem0     │ Letta    │ claude-mem   │
│                │          │ (49.5K★) │ (21.5K★) │ (34.2K★)    │
├────────────────┼──────────┼──────────┼──────────┼──────────────┤
│ Memory Store   │ File+SQL │ Graph+Vec│ Virtual  │ File-based   │
│ Retrieval      │ BM25+Emb │ Vec+Graph│ OS-style │ Keyword      │
│ Outcome Learn  │ ★ NEW    │ None     │ None     │ None         │
│ Task Routing   │ ★ UNIQUE │ None     │ None     │ None         │
│ Goal Parsing   │ ★ UNIQUE │ None     │ None     │ None         │
│ Anti-Patterns  │ ★ UNIQUE │ None     │ None     │ None         │
│ MCP Native     │ Yes      │ No       │ No       │ Yes          │
│ Multi-tenant   │ No       │ Yes      │ Yes      │ No           │
│ Community      │ ~0       │ 49,500   │ 21,500   │ 34,200       │
│ Production Use │ 18mo/1dev│ Many orgs│ Many orgs│ Many devs    │
│ Benchmarks     │ Internal │ LongMem  │ Academic │ None         │
└────────────────┴──────────┴──────────┴──────────┴──────────────┘

Cortex's real moat (3 things nobody else has):

Task orchestration + memory in one system — Mem0 is memory-only, LangGraph is orchestration-only
Anti-pattern primitives — failure mode + trigger + prevention + project context = memory type that doesn't exist in any competitor or paper
Goal-to-task pipeline — parses GOALS.md into prioritized work, routes to optimal model tier, learns from outcomes

Cortex's real weaknesses (unflinching):

Zero community — all competitors have 15K-50K stars. Cortex has 0 users besides Jesse
Single-developer validation — 18 months of one person's data. Not generalizable
Learning loop was broken for 4 days — only 2 implicit outcomes had been derived before fix
No external benchmarks — internal metrics (21.2% dedup, 0.94 PQS) mean nothing without comparison baselines
Retrieval is mediocre — BM25+embedding is table stakes. Mem0 and Supermemory have graph memory, temporal reasoning, contradiction handling

Path Assessment Against Goals

Path	Goal Alignment	Market Fit	Effort	Verdict
A: OSS launch as-is	P1 (Goal 5)	Niche but honest	2 days	DO — ship what works
B: Compete on retrieval	Low	Red ocean vs Mem0/Supermemory	3+ months	SKIP — can't win here
C: Double down on orchestration	P1 (Goal 5+9)	Unique position	1 month	DO — this is the moat
D: Integrate Mem0 for storage	Medium	Leverage their infra	2 weeks	CONSIDER — replace our weak layer with their strong one
E: Auto-research agent	P1 (Goal 9)	Novel, high-value	3 weeks	DO — compounds everything
F: Multi-tenant SaaS	Low	Premature	2+ months	SKIP — no users yet

Recommended sequence: A → C → E → D (ship → strengthen moat → build compounding → upgrade infrastructure)

Part 2: Self-Adapting System Design

The Core Problem

Cortex currently learns from user interactions (implicit feedback, model outcomes). It does NOT learn from the field — new papers, new tools, new capabilities, competitor features. This is a manual process (Jesse reads papers, implements ideas).

Target state: Cortex should have an autonomous research loop that:

Discovers relevant advances (papers, repos, tools, MCP servers)
Assesses applicability to Cortex's architecture
Proposes integration plans (with effort/impact estimates)
Tracks which innovations were adopted and their outcomes

Architecture: Cortex Research Agent (CRA)

┌─────────────────────────────────────────────────────────────┐
│                  CORTEX RESEARCH AGENT (CRA)                │
│                                                             │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐               │
│  │ Discovery│──→│ Analysis │──→│ Proposal │               │
│  │ Agent    │   │ Agent    │   │ Agent    │               │
│  └──────────┘   └──────────┘   └──────────┘               │
│       │              │              │                       │
│  Scans:         Evaluates:     Produces:                    │
│  - arxiv RSS    - Relevance    - Integration spec           │
│  - GitHub       - Effort       - Risk assessment            │
│  - HN/Reddit    - Impact       - Priority vs backlog        │
│  - MCP registry - Disruption   - Code sketch                │
│                   risk                                      │
│                                                             │
│  ┌──────────────────────────────────────────┐               │
│  │          Knowledge Base                   │               │
│  │  ~/.cortex/research/                      │               │
│  │  ├── discoveries.jsonl  (raw findings)    │               │
│  │  ├── assessments.jsonl  (scored items)    │               │
│  │  ├── proposals/         (integration plans│)              │
│  │  ├── adopted.jsonl      (what we shipped) │               │
│  │  └── dismissed.jsonl    (what we skipped) │               │
│  └──────────────────────────────────────────┘               │
│                                                             │
│  Feedback loop:                                             │
│  adopted.jsonl outcome data → refine discovery priorities   │
└─────────────────────────────────────────────────────────────┘

Discovery Agent

Sources (ranked by signal-to-noise):

Source	Method	Frequency	Signal Quality
arxiv cs.AI, cs.CL, cs.SE	RSS + semantic filter	Daily	High (but noisy)
GitHub Trending (agent, memory, MCP)	API scrape	Weekly	Medium
Anthropic changelog/blog	Web fetch	Weekly	Very high
Anthropic developer docs (memory/native API signals)	Web fetch	Weekly	Very high (existential)
Mem0 GitHub releases + changelog	GitHub API	Weekly	Very high (existential)
Papers With Code (agent-memory)	API	Weekly	High
HN front page (filtered)	API	Daily	Low (but early signal)
MCP server registry	API	Weekly	High for integrations

Semantic filter: Each discovery is scored against Cortex's capability map:

CAPABILITY_VECTORS = {
    "memory_retrieval": "BM25 embedding hybrid search pattern matching",
    "outcome_learning": "implicit feedback outcome routing model selection",
    "task_orchestration": "work discovery routing dispatch model tier",
    "anti_patterns": "failure prevention pattern memory recurring bugs",
    "context_optimization": "token budget lost-in-middle attention reordering",
    "goal_tracking": "GOALS.md parsing work items priority scheduling",
}

def score_relevance(discovery_text: str) -> Dict[str, float]:
    """Score discovery against each capability vector."""
    # Embedding cosine similarity against each vector
    # Returns {"memory_retrieval": 0.72, "outcome_learning": 0.85, ...}

Analysis Agent

For each high-relevance discovery (score > 0.6 on any capability):

@dataclass
class ResearchAssessment:
    discovery_id: str
    title: str
    source: str  # paper URL, repo URL

    # Impact assessment
    relevance_scores: Dict[str, float]  # per-capability
    disruption_risk: float  # 0-1: how much does this threaten our approach?
    adoption_effort: str  # "trivial" | "small" | "medium" | "large" | "rewrite"
    expected_impact: str  # "incremental" | "significant" | "transformative"

    # Integration sketch
    affected_modules: List[str]  # e.g. ["intelligence/memory/hybrid_retriever.py"]
    integration_approach: str  # 1-paragraph plan
    risks: List[str]

    # Decision
    recommendation: str  # "adopt" | "monitor" | "dismiss"
    reasoning: str

Key heuristics:

If disruption_risk > 0.7 AND adoption_effort <= "medium" → ADOPT urgently
If disruption_risk > 0.7 AND adoption_effort > "medium" → MONITOR + plan
If expected_impact == "transformative" → Always assess, regardless of effort
If provider-native (Anthropic ships memory API) → ADAPT immediately

Proposal Agent

Generates integration plans in Golden Spec format:

## Research Integration: [Title]

**Source:** [paper/repo URL]
**Assessed:** [date]
**Recommendation:** ADOPT / MONITOR / DISMISS

### What It Is
[2-3 sentences]

### Why It Matters for Cortex
[Specific capability it improves/threatens]

### Integration Plan
1. [Step with affected file]
2. [Step with affected file]

### Risk Assessment
- [Risk 1]
- [Risk 2]

### Success Criteria
- [Measurable outcome]

### Effort: [S/M/L]

Execution: How CRA Runs

Option A: Batch API (Recommended — 50% cost savings)

cortex research scan          # Discovery agent (haiku tier, daily)
cortex research assess        # Analysis agent (sonnet tier, weekly)
cortex research propose       # Proposal agent (opus tier, on-demand)
cortex research digest        # Human-readable weekly summary

CRA jobs go into the existing batch queue (~/.cortex/batch/), benefiting from the overnight dispatch window (2-6 AM UTC).

Option B: Claude Code Cowork Integration If Anthropic's cowork feature supports data passing between sessions:

CRA runs as a cowork participant
Shares discoveries via MCP resources (cortex://research/latest)
Main development session can query cortex_research_status tool
Proposals surface in daily briefing

Option C: Standalone daemon

# cortex/engines/research_agent.py
class CortexResearchAgent:
    def __init__(self):
        self.discovery = DiscoveryEngine(sources=SOURCES)
        self.analyzer = AnalysisEngine(capability_map=CAPABILITY_VECTORS)
        self.proposer = ProposalEngine()

    async def daily_scan(self):
        discoveries = await self.discovery.scan()
        for d in discoveries:
            if d.relevance > 0.6:
                assessment = await self.analyzer.assess(d)
                if assessment.recommendation == "adopt":
                    proposal = await self.proposer.generate(assessment)
                    self.notify(proposal)

    async def weekly_digest(self) -> str:
        """Generate human-readable research digest."""
        assessments = self.load_recent_assessments(days=7)
        return format_digest(assessments)

Adaptation Mechanism: The "Evolve" Loop

Beyond just discovering papers, Cortex needs to evolve its own capabilities:

┌─────────────────────────────────────────────────────┐
│                 THE EVOLVE LOOP                      │
│                                                      │
│  1. DISCOVER  ───→  New capability identified        │
│       ↓                                              │
│  2. ASSESS    ───→  Scored against current arch      │
│       ↓                                              │
│  3. PROTOTYPE ───→  Minimal integration (branch)     │
│       ↓                                              │
│  4. VALIDATE  ───→  A/B test vs current behavior     │
│       ↓                                              │
│  5. SHIP      ───→  If validates > current, deploy   │
│       ↓                                              │
│  6. LEARN     ───→  Track adoption outcome           │
│       └─────────────────────→ feeds back to (1)      │
└─────────────────────────────────────────────────────┘

Critical insight from the research: The paper "Adaptive Memory Admission Control" (arXiv 2603.04549) shows that deciding what NOT to adopt is as important as what to adopt. CRA needs a dismissed.jsonl with reasoning, so it doesn't re-evaluate the same things.

Part 3: Competitive Strategy + Research Integration

Where to Compete (and Where Not To)

┌─────────────────────────────────────────────────────────────────┐
│              COMPETITIVE STRATEGY MAP                            │
│                                                                  │
│  DON'T COMPETE                    COMPETE HERE                   │
│  (commodity layer)                (intelligence layer)           │
│                                                                  │
│  ┌──────────────┐                ┌──────────────────────────┐   │
│  │ Vector store │                │ Outcome-aware retrieval  │   │
│  │ Basic RAG    │                │ Anti-pattern primitives  │   │
│  │ Session state│                │ Task orchestration+memory│   │
│  │ MCP transport│                │ Goal-to-task pipeline    │   │
│  │ Embedding gen│                │ Model tier routing       │   │
│  └──────────────┘                │ Implicit feedback loop   │   │
│     ↓                            │ Auto-research evolution  │   │
│  USE: Mem0, LangGraph,           └──────────────────────────┘   │
│  native provider memory             ↓                           │
│                                   BUILD: This is the moat       │
└─────────────────────────────────────────────────────────────────┘

6-Month Roadmap (Phased)

Phase 1: Ship + Validate (Mar 12 — Mar 28) — CURRENT

Item	Priority	Status	Dependency
OSS launch (subtree, DOI, HN)	P0	80% done (audit: LAUNCH READY)	None
Learning pipeline verified (outcomes flowing)	P0	✅ SHIPPED	None
Conversation history ingestion	P0	✅ SHIPPED	None
CRA discovery engine	P1	✅ SHIPPED (21 discoveries ingested, 35 tests)	None
CRA batch assessment pipeline	P1	✅ SHIPPED (CRABatcher in research_batcher.py)	CRA discovery
External benchmark (AMA-Bench or LongMemEval)	P1	Not started	OSS launch
First 3 beta users with feedback	P1	Not started	OSS launch
Batch API deep conversation analysis	P2	Not started	Conversation ingestion validated

Success criteria: 5+ GitHub stars from non-Jesse users. 1 external person runs cortex status successfully.

Phase 2: Strengthen the Moat (Apr 1 — Apr 30)

Item	Priority	Effort	Impact
Trajectory-informed memory	P1	2 weeks	High — learn from HOW tasks were solved, not just outcomes
Graph memory for anti-patterns	P2	1 week	Medium — causal links between anti-patterns, projects, failures
Memory admission control	P2	1 week	Medium — decide what NOT to remember (paper 2603.04549)
CLI decomposition	P2	1 week	Medium — cli.py is 5K lines, blocks contributions

Key research to integrate:

"Trajectory-Informed Memory Generation" (arXiv 2603.10600) — +14.3pp on AppWorld
"Adaptive Memory Admission Control" (arXiv 2603.04549) — 5-factor admission scoring
"AutoSkill" (arXiv 2603.01145) — extract reusable skills from interaction traces

Trajectory memory design sketch:

# New module: intelligence/memory/trajectory_memory.py
@dataclass
class TrajectoryPattern:
    """Learned from a successful task execution."""
    task_type: str           # "debug", "implement", "refactor"
    decision_points: List[DecisionPoint]  # where the agent chose
    outcome: str             # success/partial/failed
    key_actions: List[str]   # what actually worked
    anti_actions: List[str]  # what was tried and failed

    # Attribution
    source_session: str
    confidence: float
    reuse_count: int = 0

class TrajectoryMemory:
    """Learn from HOW tasks were solved, not just WHAT was discussed."""

    def extract_from_session(self, session_log: List[dict]) -> List[TrajectoryPattern]:
        """Analyze a completed session → extract reusable patterns."""
        # 1. Identify decision points (tool choices, file selections)
        # 2. Attribute outcomes to specific decisions
        # 3. Extract generalizable patterns
        pass

    def suggest_approach(self, task: WorkItem) -> Optional[TrajectoryPattern]:
        """Given a new task, suggest approach based on similar past trajectories."""
        # Semantic search over trajectory patterns
        # Rank by outcome quality + similarity
        pass

Phase 3: Auto-Research Agent (May 1 — May 21)

Reference architecture: Karpathy's autoresearch (33.5K★, 2026-03-06). Key pattern to adopt: program.md → agent edits code → train 5min → evaluate val_bpb → keep/discard → repeat. Cortex adaptation: research_directives.md → CRA proposes integration → batch prototype → evaluate metric → keep/discard → repeat. Critical constraint borrowed: one scalar metric per experiment cycle (autoresearch uses val_bpb; CRA uses adoption_outcome_score).

Item	Priority	Effort	Impact
Discovery engine (arxiv, GitHub, MCP registry)	P1	✅ SHIPPED	Foundation for everything
Analysis engine (relevance scoring, disruption detection)	P1	✅ SHIPPED	Filter signal from noise
Proposal engine (integration specs)	P2	3 days	Actionable output
Weekly research digest (in `cortex briefing`)	P1	2 days	User-facing value
Batch API integration (overnight research scans)	P2	✅ SHIPPED	Cost-efficient
Autoresearch-style experiment loop	P1	1 week	Autonomous validate/discard cycle for CRA proposals
research_directives.md (human-authored CRA program)	P1	2 days	Karpathy's program.md pattern — human steers, agent executes

Autoresearch-inspired experiment loop (NEW — from Karpathy's autoresearch):

The current CRA pipeline is: discover → assess → propose → (human decides). The missing piece is autonomous validation — the agent should be able to prototype an integration, evaluate it against a single metric, and keep/discard without human intervention.

┌──────────────────────────────────────────────────────────────────┐
│         CRA EXPERIMENT LOOP (autoresearch-adapted)               │
│                                                                   │
│  research_directives.md ──→ CRA Agent ──→ Propose integration    │
│  (human steers)               │               │                   │
│                                │    ┌──────────▼──────────────┐  │
│                                │    │ Batch prototype          │  │
│                                │    │ (branch, implement,      │  │
│                                │    │  run tests)              │  │
│                                │    └──────────┬──────────────┘  │
│                                │               │                  │
│                                │    ┌──────────▼──────────────┐  │
│                                │    │ Evaluate single metric:  │  │
│                                │    │ adoption_outcome_score   │  │
│                                │    │ (test_pass_rate ×        │  │
│                                │    │  capability_coverage ×   │  │
│                                │    │  disruption_addressed)   │  │
│                                │    └──────────┬──────────────┘  │
│                                │               │                  │
│                           ┌────▼───────────────▼────┐            │
│                           │  Score improved?         │            │
│                           │  YES → merge to staging  │            │
│                           │  NO  → discard + log why │            │
│                           └─────────────────────────┘            │
│                                    │                              │
│                                    └──── REPEAT overnight ───────│
└──────────────────────────────────────────────────────────────────┘

Key constraint from autoresearch: One scalar metric (adoption_outcome_score) keeps the loop tractable. Multi-dimensional evaluation causes the agent to hedge — autoresearch proved that constraining to val_bpb alone was sufficient for the agent to independently rediscover RMSNorm and tied embeddings. CRA's equivalent:

def adoption_outcome_score(proposal_result) -> float:
    """Single scalar metric for CRA experiment loop.

    Mirrors autoresearch's val_bpb — lower is better there,
    higher is better here. Range: 0.0–1.0.
    """
    test_pass = proposal_result.tests_passing / proposal_result.tests_total
    capability_gain = proposal_result.capability_score_delta  # 0-1
    disruption_addressed = 1.0 if proposal_result.addresses_threat else 0.0

    # Weighted: tests matter most, then capability, then threat response
    return (0.5 * test_pass) + (0.3 * capability_gain) + (0.2 * disruption_addressed)

Cowork integration assessment:

Anthropic's cowork feature (if available) would enable:

┌──────────────────┐     ┌──────────────────┐
│ Main Dev Session │     │ Research Agent    │
│ (Claude Code)    │────→│ (Cowork session) │
│                  │     │                  │
│ "What's new in   │     │ Scans arxiv,     │
│  agent memory?"  │     │ GitHub, MCP      │
│                  │←────│                  │
│ Gets structured  │     │ Returns scored   │
│ research digest  │     │ discoveries      │
└──────────────────┘     └──────────────────┘

Without cowork (fallback): CRA writes to ~/.cortex/research/ and results surface through existing MCP tools:

cortex_intelligence("what research is relevant to my current task?")
  → includes recent CRA discoveries in context

Data bridge pattern (works with or without cowork):

# cortex/engines/research_agent.py
RESEARCH_DIR = Path.home() / ".cortex" / "research"

class CRABridge:
    """Bridge between CRA output and Cortex intelligence layer."""

    def get_relevant_discoveries(self, task_context: str) -> List[Discovery]:
        """Query CRA knowledge base for task-relevant research."""
        discoveries = self._load_recent(days=30)
        return self._rank_by_relevance(discoveries, task_context)

    def surface_in_briefing(self) -> str:
        """Add research section to daily briefing."""
        week = self._load_recent(days=7)
        adopt = [d for d in week if d.recommendation == "adopt"]
        monitor = [d for d in week if d.recommendation == "monitor"]
        return self._format_briefing_section(adopt, monitor)

Phase 4: Infrastructure Upgrade (Jun 1 — Jun 30)

Item	Priority	Effort	Impact
Mem0 integration (replace file-based with graph memory)	P2	2 weeks	Leverage 49K-star infra
AMA-Bench evaluation (arXiv 2602.22769)	P1	1 week	External credibility
Provider memory detection (if Anthropic ships native)	P1	1 week	Existential adaptation
Multi-user support (team memory sharing)	P3	2 weeks	Growth path

Mem0 integration design:

# Don't replace everything — layer Mem0 under Cortex's intelligence
# Mem0 handles: storage, embedding, graph relationships
# Cortex handles: outcome learning, task routing, anti-patterns, goals

from mem0 import Memory

class CortexMemoryBackend:
    """Pluggable backend: file-based (default) or Mem0."""

    def __init__(self, backend="file"):
        if backend == "mem0":
            self.store = Memory()  # Mem0's graph + vector store
        else:
            self.store = FileMemoryStore()  # Current implementation

    # Cortex-specific operations layer on top
    def store_anti_pattern(self, pattern: AntiPattern):
        """Anti-pattern is a Cortex concept — stored via any backend."""
        self.store.add(
            messages=[{"role": "system", "content": pattern.serialize()}],
            metadata={"type": "anti_pattern", "project": pattern.project},
            user_id=self.user_id,
        )

Phase 5: Compounding (Jul — Sep 2026)

Item	Priority	Effort	Impact
Causal retrieval (retrieve by cause, not similarity)	P2	3 weeks	Next-gen retrieval
Learned forgetting (graceful memory degradation)	P3	2 weeks	Long-term health
Cross-repo transfer (memory sharing across repos)	P2	2 weeks	Portfolio value
CRA self-improvement (research agent learns what to scan)	P3	1 week	Meta-learning

Research Papers to Track (Priority Queue)

Paper	ArXiv	Why It Matters	When to Integrate
Trajectory-Informed Memory	2603.10600	+14.3pp improvement. Directly maps to Cortex's interaction capture	Phase 2 (April)
Adaptive Memory Admission	2603.04549	5-factor admission scoring. Cortex stores everything — needs curation	Phase 2 (April)
AutoSkill	2603.01145	Skills from traces = anti-patterns generalized	Phase 2 (April)
MACLA	2512.18950	Hierarchical procedural memory + Bayesian selection, 90.3% ALFWorld, 56s build. Frozen LLM + external memory = Cortex's exact architecture. Near-real-time trajectory extraction viable	Phase 2 (April)
A-Mem	2502.12110	85-93% token reduction (~1,200 tok/op). Doubles multi-hop reasoning. Benchmark target for Cortex memory efficiency	Phase 2 (April)
AMA-Bench	2602.22769	First real benchmark for agent memory	Phase 4 (June)
RetroAgent	2603.08561	Dual intrinsic feedback without external reward	Phase 3 (May)
Memory Survey (5 mechanisms)	2603.07670	Taxonomy to validate our architecture decisions	Read immediately
MAGMA	—	Multi-graph agent memory. Cross-domain knowledge linking via graph structures. Validates Cortex's graph anti-pattern direction	Phase 2 (April)
EverMemOS	—	Memory operating system for structured long-horizon reasoning. Architecturally close to Cortex — assess for convergent patterns	Phase 3 (May)
TA-Mem	2603.09297	Agent autonomously explores memory via tools	Phase 5 (Jul+)

Disruption Scenarios (What Could Kill Cortex)

Scenario	Probability	Impact	Cortex Response
Anthropic ships native memory API	60% by Sep 2026	HIGH — commoditizes basic memory	Pivot to orchestration layer ON TOP of native memory. Anti-patterns + routing remain unique
Mem0 adds task orchestration	20%	HIGH — direct competitor	Ship faster. 49K stars + orchestration = game over for us
Context windows reach 10M tokens	40% by Dec 2026	MEDIUM — reduces need for memory	Memory still needed for curation, not just storage. 10M tokens of noise < 1K tokens of curated context
Claude Code gets built-in learning	30% by Sep 2026	VERY HIGH — our exact use case	Pivot to cross-tool layer (not Claude-specific)
Cursor ships cross-session memory	15% by Sep 2026	MEDIUM — commoditizes orchestration+memory combo	Monitor Cursor's agent mode evolution. If they add persistent memory across worktree sessions, our "orchestration+memory in one system" moat narrows. Hedge: ensure Cortex's anti-pattern + outcome learning layers remain unique

Hedging strategy: Every Cortex feature should work with ANY LLM agent, not just Claude Code. MCP is the right abstraction layer. If any provider ships native memory, Cortex becomes the intelligence layer on top.

Metrics That Matter (Not Vanity)

┌─────────────────────────────────────────────────────────┐
│  CORTEX NORTH STAR METRICS                              │
│                                                          │
│  Adoption:                                               │
│  ├── GitHub stars (target: 100 by Jun, 500 by Sep)      │
│  ├── pip installs / week (target: 50 by Jun)            │
│  └── Issues filed by non-Jesse users (target: 10 by Jun)│
│                                                          │
│  Quality:                                                │
│  ├── AMA-Bench score (baseline TBD)                     │
│  ├── Anti-pattern recurrence rate (target: <5%)         │
│  └── Model routing accuracy (target: >80% optimal)      │
│                                                          │
│  Learning:                                               │
│  ├── Implicit outcomes derived / week (target: 50+)     │
│  ├── Outcome→retrieval boost measured improvement        │
│  └── CRA discoveries adopted / month (target: 2-3)      │
│                                                          │
│  Compounding:                                            │
│  ├── Time-to-productive-session (should decrease)        │
│  └── Repeated mistakes (anti-pattern hits, should → 0)  │
│                                                          │
│  Research Agent:                                         │
│  ├── Discoveries scanned / week                         │
│  ├── Assessments generated / week                       │
│  └── Proposals adopted → outcome (did it actually help?)│
└─────────────────────────────────────────────────────────┘

Implementation Priority (Next 2 Weeks)

Week of Mar 12-13 (SHIP WEEK):
  ├── [x] Outcome-aware retrieval wired
  ├── [x] CRA discovery engine (engines/research_agent.py, 35 tests)
  ├── [x] CRA → supervisor intake wired (from_research_agent in discover_all)
  ├── [x] CRA batch assessment pipeline (CRABatcher in research_batcher.py)
  ├── [x] ROADMAP updated: 4 papers, 2 threat sources, 1 disruption scenario
  ├── [x] OSS audit: LAUNCH READY (all 14 categories pass)
  ├── [ ] git push cortex-oss main:main
  ├── [ ] Zenodo DOI
  ├── [ ] Show HN post
  └── [ ] Share with beta users

Week of Mar 17-21 (RESEARCH AGENT FOUNDATION):
  ├── [ ] Read survey paper (2603.07670) — inform all decisions
  ├── [x] Prototype CRA discovery engine (arxiv RSS + semantic filter) — DONE early
  ├── [ ] Wire CRA output into cortex briefing (weekly_digest → briefing.py)
  ├── [ ] Design trajectory memory data model (informed by MACLA paper)
  └── [ ] CLI decomposition (cli.py → commands/)

Decision Log

Date	Decision	Reasoning
2026-03-12	Don't compete on retrieval quality	Mem0/Supermemory have 50K+ stars and dedicated teams. Our BM25+embedding is adequate. Compete on intelligence layer instead
2026-03-12	Build auto-research agent before Mem0 integration	CRA compounds everything — helps us discover what to integrate and when. Mem0 integration is a point improvement
2026-03-12	MCP as primary interface (not Claude-specific)	Provider-native memory is coming. MCP abstracts across providers. Reduces lock-in risk
2026-03-12	Batch API for research scans	50% cost savings. Research is not latency-sensitive. Fits existing overnight dispatch infrastructure

Appendix: CRA Technical Specification

Module Structure

cortex/
├── engines/
│   └── research_agent/
│       ├── __init__.py
│       ├── discovery.py      # Source scanning (arxiv, GitHub, MCP)
│       ├── analysis.py       # Relevance scoring, disruption detection
│       ├── proposal.py       # Integration plan generation
│       ├── bridge.py         # CRA ↔ Cortex intelligence bridge
│       └── sources/
│           ├── arxiv.py      # arxiv RSS + API
│           ├── github.py     # Trending repos + topic search
│           ├── mcp_registry.py  # MCP server discovery
│           └── hacker_news.py   # HN API filtered search

Data Models

@dataclass
class Discovery:
    id: str
    source: str          # "arxiv", "github", "mcp", "hn"
    title: str
    url: str
    summary: str         # 2-3 sentence summary
    discovered_at: datetime
    relevance_scores: Dict[str, float]  # per-capability
    raw_metadata: dict

@dataclass
class Assessment:
    discovery_id: str
    disruption_risk: float       # 0-1
    adoption_effort: str         # trivial/small/medium/large/rewrite
    expected_impact: str         # incremental/significant/transformative
    affected_modules: List[str]
    integration_approach: str    # 1-paragraph
    risks: List[str]
    recommendation: str          # adopt/monitor/dismiss
    reasoning: str
    assessed_at: datetime

@dataclass
class Proposal:
    assessment_id: str
    title: str
    spec: str                    # Golden Spec format markdown
    estimated_effort_days: int
    success_criteria: List[str]
    created_at: datetime
    status: str                  # draft/approved/implementing/shipped/abandoned
    outcome: Optional[str]       # measured result after shipping

Batch Integration

# In supervisor/intake.py — add research tasks to work discovery
def discover_from_research() -> List[WorkItem]:
    """Surface CRA proposals as potential work items."""
    proposals = CRABridge().get_pending_proposals()
    items = []
    for p in proposals:
        if p.status == "approved":
            items.append(WorkItem(
                title=f"Research integration: {p.title}",
                source="cra",
                priority=WorkItemPriority.MEDIUM,
                estimated_complexity=p.estimated_effort_days,
            ))
    return items

Cowork / Data Sharing Protocol

# If cowork is available, CRA exposes MCP resources:
@mcp.resource("cortex://research/discoveries")
def get_recent_discoveries():
    """Last 7 days of CRA discoveries, scored and sorted."""
    return CRABridge().get_relevant_discoveries(days=7)

@mcp.resource("cortex://research/proposals")
def get_pending_proposals():
    """Integration proposals awaiting approval."""
    return CRABridge().get_pending_proposals()

@mcp.tool("cortex_research_assess")
def assess_topic(topic: str) -> str:
    """On-demand: assess a specific technology/paper for Cortex relevance."""
    discovery = Discovery(title=topic, source="manual", ...)
    assessment = AnalysisEngine().assess(discovery)
    return assessment.to_json()

Without Cowork (File-Based Fallback)

~/.cortex/research/
├── discoveries.jsonl       # Append-only discovery log
├── assessments.jsonl       # Scored assessments
├── proposals/
│   ├── 2026-03-15_trajectory_memory.md
│   └── 2026-03-22_mem0_integration.md
├── adopted.jsonl           # Shipped integrations + outcomes
├── dismissed.jsonl         # Rejected with reasoning
└── digest_cache.json       # Weekly digest cache

CRA runs via batch queue (overnight), writes to these files. cortex briefing reads them. No cowork dependency required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cortex Strategic Roadmap

Part 1: Paths Forward (Goal-Aligned Assessment)

Current Position (Honest)

Path Assessment Against Goals

Part 2: Self-Adapting System Design

The Core Problem

Architecture: Cortex Research Agent (CRA)

Discovery Agent

Analysis Agent

Proposal Agent

Execution: How CRA Runs

Adaptation Mechanism: The "Evolve" Loop

Part 3: Competitive Strategy + Research Integration

Where to Compete (and Where Not To)

6-Month Roadmap (Phased)

Phase 1: Ship + Validate (Mar 12 — Mar 28) — CURRENT

Phase 2: Strengthen the Moat (Apr 1 — Apr 30)

Phase 3: Auto-Research Agent (May 1 — May 21)

Phase 4: Infrastructure Upgrade (Jun 1 — Jun 30)

Phase 5: Compounding (Jul — Sep 2026)

Research Papers to Track (Priority Queue)

Disruption Scenarios (What Could Kill Cortex)

Metrics That Matter (Not Vanity)

Implementation Priority (Next 2 Weeks)

Decision Log

Appendix: CRA Technical Specification

Module Structure

Data Models

Batch Integration

Cowork / Data Sharing Protocol

Without Cowork (File-Based Fallback)

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Cortex Strategic Roadmap

Part 1: Paths Forward (Goal-Aligned Assessment)

Current Position (Honest)

Path Assessment Against Goals

Part 2: Self-Adapting System Design

The Core Problem

Architecture: Cortex Research Agent (CRA)

Discovery Agent

Analysis Agent

Proposal Agent

Execution: How CRA Runs

Adaptation Mechanism: The "Evolve" Loop

Part 3: Competitive Strategy + Research Integration

Where to Compete (and Where Not To)

6-Month Roadmap (Phased)

Phase 1: Ship + Validate (Mar 12 — Mar 28) — CURRENT

Phase 2: Strengthen the Moat (Apr 1 — Apr 30)

Phase 3: Auto-Research Agent (May 1 — May 21)

Phase 4: Infrastructure Upgrade (Jun 1 — Jun 30)

Phase 5: Compounding (Jul — Sep 2026)

Research Papers to Track (Priority Queue)

Disruption Scenarios (What Could Kill Cortex)

Metrics That Matter (Not Vanity)

Implementation Priority (Next 2 Weeks)

Decision Log

Appendix: CRA Technical Specification

Module Structure

Data Models

Batch Integration

Cowork / Data Sharing Protocol

Without Cowork (File-Based Fallback)