Created: 2026-03-12 Author: Jesse Kemp + Claude Horizon: 6 months (Mar — Sep 2026) Review: Bi-weekly
┌─────────────────────────────────────────────────────────────────┐
│ CORTEX COMPETITIVE POSITION — MARCH 2026 │
├────────────────┬──────────┬──────────┬──────────┬──────────────┤
│ │ Cortex │ Mem0 │ Letta │ claude-mem │
│ │ │ (49.5K★) │ (21.5K★) │ (34.2K★) │
├────────────────┼──────────┼──────────┼──────────┼──────────────┤
│ Memory Store │ File+SQL │ Graph+Vec│ Virtual │ File-based │
│ Retrieval │ BM25+Emb │ Vec+Graph│ OS-style │ Keyword │
│ Outcome Learn │ ★ NEW │ None │ None │ None │
│ Task Routing │ ★ UNIQUE │ None │ None │ None │
│ Goal Parsing │ ★ UNIQUE │ None │ None │ None │
│ Anti-Patterns │ ★ UNIQUE │ None │ None │ None │
│ MCP Native │ Yes │ No │ No │ Yes │
│ Multi-tenant │ No │ Yes │ Yes │ No │
│ Community │ ~0 │ 49,500 │ 21,500 │ 34,200 │
│ Production Use │ 18mo/1dev│ Many orgs│ Many orgs│ Many devs │
│ Benchmarks │ Internal │ LongMem │ Academic │ None │
└────────────────┴──────────┴──────────┴──────────┴──────────────┘
Cortex's real moat (3 things nobody else has):
- Task orchestration + memory in one system — Mem0 is memory-only, LangGraph is orchestration-only
- Anti-pattern primitives — failure mode + trigger + prevention + project context = memory type that doesn't exist in any competitor or paper
- Goal-to-task pipeline — parses GOALS.md into prioritized work, routes to optimal model tier, learns from outcomes
Cortex's real weaknesses (unflinching):
- Zero community — all competitors have 15K-50K stars. Cortex has 0 users besides Jesse
- Single-developer validation — 18 months of one person's data. Not generalizable
- Learning loop was broken for 4 days — only 2 implicit outcomes had been derived before fix
- No external benchmarks — internal metrics (21.2% dedup, 0.94 PQS) mean nothing without comparison baselines
- Retrieval is mediocre — BM25+embedding is table stakes. Mem0 and Supermemory have graph memory, temporal reasoning, contradiction handling
| Path | Goal Alignment | Market Fit | Effort | Verdict |
|---|---|---|---|---|
| A: OSS launch as-is | P1 (Goal 5) | Niche but honest | 2 days | DO — ship what works |
| B: Compete on retrieval | Low | Red ocean vs Mem0/Supermemory | 3+ months | SKIP — can't win here |
| C: Double down on orchestration | P1 (Goal 5+9) | Unique position | 1 month | DO — this is the moat |
| D: Integrate Mem0 for storage | Medium | Leverage their infra | 2 weeks | CONSIDER — replace our weak layer with their strong one |
| E: Auto-research agent | P1 (Goal 9) | Novel, high-value | 3 weeks | DO — compounds everything |
| F: Multi-tenant SaaS | Low | Premature | 2+ months | SKIP — no users yet |
Recommended sequence: A → C → E → D (ship → strengthen moat → build compounding → upgrade infrastructure)
Cortex currently learns from user interactions (implicit feedback, model outcomes). It does NOT learn from the field — new papers, new tools, new capabilities, competitor features. This is a manual process (Jesse reads papers, implements ideas).
Target state: Cortex should have an autonomous research loop that:
- Discovers relevant advances (papers, repos, tools, MCP servers)
- Assesses applicability to Cortex's architecture
- Proposes integration plans (with effort/impact estimates)
- Tracks which innovations were adopted and their outcomes
┌─────────────────────────────────────────────────────────────┐
│ CORTEX RESEARCH AGENT (CRA) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Discovery│──→│ Analysis │──→│ Proposal │ │
│ │ Agent │ │ Agent │ │ Agent │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ Scans: Evaluates: Produces: │
│ - arxiv RSS - Relevance - Integration spec │
│ - GitHub - Effort - Risk assessment │
│ - HN/Reddit - Impact - Priority vs backlog │
│ - MCP registry - Disruption - Code sketch │
│ risk │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Knowledge Base │ │
│ │ ~/.cortex/research/ │ │
│ │ ├── discoveries.jsonl (raw findings) │ │
│ │ ├── assessments.jsonl (scored items) │ │
│ │ ├── proposals/ (integration plans│) │
│ │ ├── adopted.jsonl (what we shipped) │ │
│ │ └── dismissed.jsonl (what we skipped) │ │
│ └──────────────────────────────────────────┘ │
│ │
│ Feedback loop: │
│ adopted.jsonl outcome data → refine discovery priorities │
└─────────────────────────────────────────────────────────────┘
Sources (ranked by signal-to-noise):
| Source | Method | Frequency | Signal Quality |
|---|---|---|---|
| arxiv cs.AI, cs.CL, cs.SE | RSS + semantic filter | Daily | High (but noisy) |
| GitHub Trending (agent, memory, MCP) | API scrape | Weekly | Medium |
| Anthropic changelog/blog | Web fetch | Weekly | Very high |
| Anthropic developer docs (memory/native API signals) | Web fetch | Weekly | Very high (existential) |
| Mem0 GitHub releases + changelog | GitHub API | Weekly | Very high (existential) |
| Papers With Code (agent-memory) | API | Weekly | High |
| HN front page (filtered) | API | Daily | Low (but early signal) |
| MCP server registry | API | Weekly | High for integrations |
Semantic filter: Each discovery is scored against Cortex's capability map:
CAPABILITY_VECTORS = {
"memory_retrieval": "BM25 embedding hybrid search pattern matching",
"outcome_learning": "implicit feedback outcome routing model selection",
"task_orchestration": "work discovery routing dispatch model tier",
"anti_patterns": "failure prevention pattern memory recurring bugs",
"context_optimization": "token budget lost-in-middle attention reordering",
"goal_tracking": "GOALS.md parsing work items priority scheduling",
}
def score_relevance(discovery_text: str) -> Dict[str, float]:
"""Score discovery against each capability vector."""
# Embedding cosine similarity against each vector
# Returns {"memory_retrieval": 0.72, "outcome_learning": 0.85, ...}For each high-relevance discovery (score > 0.6 on any capability):
@dataclass
class ResearchAssessment:
discovery_id: str
title: str
source: str # paper URL, repo URL
# Impact assessment
relevance_scores: Dict[str, float] # per-capability
disruption_risk: float # 0-1: how much does this threaten our approach?
adoption_effort: str # "trivial" | "small" | "medium" | "large" | "rewrite"
expected_impact: str # "incremental" | "significant" | "transformative"
# Integration sketch
affected_modules: List[str] # e.g. ["intelligence/memory/hybrid_retriever.py"]
integration_approach: str # 1-paragraph plan
risks: List[str]
# Decision
recommendation: str # "adopt" | "monitor" | "dismiss"
reasoning: strKey heuristics:
- If
disruption_risk > 0.7ANDadoption_effort <= "medium"→ ADOPT urgently - If
disruption_risk > 0.7ANDadoption_effort > "medium"→ MONITOR + plan - If
expected_impact == "transformative"→ Always assess, regardless of effort - If provider-native (Anthropic ships memory API) → ADAPT immediately
Generates integration plans in Golden Spec format:
## Research Integration: [Title]
**Source:** [paper/repo URL]
**Assessed:** [date]
**Recommendation:** ADOPT / MONITOR / DISMISS
### What It Is
[2-3 sentences]
### Why It Matters for Cortex
[Specific capability it improves/threatens]
### Integration Plan
1. [Step with affected file]
2. [Step with affected file]
### Risk Assessment
- [Risk 1]
- [Risk 2]
### Success Criteria
- [Measurable outcome]
### Effort: [S/M/L]Option A: Batch API (Recommended — 50% cost savings)
cortex research scan # Discovery agent (haiku tier, daily)
cortex research assess # Analysis agent (sonnet tier, weekly)
cortex research propose # Proposal agent (opus tier, on-demand)
cortex research digest # Human-readable weekly summary
CRA jobs go into the existing batch queue (~/.cortex/batch/), benefiting from the overnight dispatch window (2-6 AM UTC).
Option B: Claude Code Cowork Integration If Anthropic's cowork feature supports data passing between sessions:
- CRA runs as a cowork participant
- Shares discoveries via MCP resources (
cortex://research/latest) - Main development session can query
cortex_research_statustool - Proposals surface in daily briefing
Option C: Standalone daemon
# cortex/engines/research_agent.py
class CortexResearchAgent:
def __init__(self):
self.discovery = DiscoveryEngine(sources=SOURCES)
self.analyzer = AnalysisEngine(capability_map=CAPABILITY_VECTORS)
self.proposer = ProposalEngine()
async def daily_scan(self):
discoveries = await self.discovery.scan()
for d in discoveries:
if d.relevance > 0.6:
assessment = await self.analyzer.assess(d)
if assessment.recommendation == "adopt":
proposal = await self.proposer.generate(assessment)
self.notify(proposal)
async def weekly_digest(self) -> str:
"""Generate human-readable research digest."""
assessments = self.load_recent_assessments(days=7)
return format_digest(assessments)Beyond just discovering papers, Cortex needs to evolve its own capabilities:
┌─────────────────────────────────────────────────────┐
│ THE EVOLVE LOOP │
│ │
│ 1. DISCOVER ───→ New capability identified │
│ ↓ │
│ 2. ASSESS ───→ Scored against current arch │
│ ↓ │
│ 3. PROTOTYPE ───→ Minimal integration (branch) │
│ ↓ │
│ 4. VALIDATE ───→ A/B test vs current behavior │
│ ↓ │
│ 5. SHIP ───→ If validates > current, deploy │
│ ↓ │
│ 6. LEARN ───→ Track adoption outcome │
│ └─────────────────────→ feeds back to (1) │
└─────────────────────────────────────────────────────┘
Critical insight from the research: The paper "Adaptive Memory Admission Control" (arXiv 2603.04549) shows that deciding what NOT to adopt is as important as what to adopt. CRA needs a dismissed.jsonl with reasoning, so it doesn't re-evaluate the same things.
┌─────────────────────────────────────────────────────────────────┐
│ COMPETITIVE STRATEGY MAP │
│ │
│ DON'T COMPETE COMPETE HERE │
│ (commodity layer) (intelligence layer) │
│ │
│ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ Vector store │ │ Outcome-aware retrieval │ │
│ │ Basic RAG │ │ Anti-pattern primitives │ │
│ │ Session state│ │ Task orchestration+memory│ │
│ │ MCP transport│ │ Goal-to-task pipeline │ │
│ │ Embedding gen│ │ Model tier routing │ │
│ └──────────────┘ │ Implicit feedback loop │ │
│ ↓ │ Auto-research evolution │ │
│ USE: Mem0, LangGraph, └──────────────────────────┘ │
│ native provider memory ↓ │
│ BUILD: This is the moat │
└─────────────────────────────────────────────────────────────────┘
| Item | Priority | Status | Dependency |
|---|---|---|---|
| OSS launch (subtree, DOI, HN) | P0 | 80% done (audit: LAUNCH READY) | None |
| Learning pipeline verified (outcomes flowing) | P0 | ✅ SHIPPED | None |
| Conversation history ingestion | P0 | ✅ SHIPPED | None |
| CRA discovery engine | P1 | ✅ SHIPPED (21 discoveries ingested, 35 tests) | None |
| CRA batch assessment pipeline | P1 | ✅ SHIPPED (CRABatcher in research_batcher.py) | CRA discovery |
| External benchmark (AMA-Bench or LongMemEval) | P1 | Not started | OSS launch |
| First 3 beta users with feedback | P1 | Not started | OSS launch |
| Batch API deep conversation analysis | P2 | Not started | Conversation ingestion validated |
Success criteria: 5+ GitHub stars from non-Jesse users. 1 external person runs cortex status successfully.
| Item | Priority | Effort | Impact |
|---|---|---|---|
| Trajectory-informed memory | P1 | 2 weeks | High — learn from HOW tasks were solved, not just outcomes |
| Graph memory for anti-patterns | P2 | 1 week | Medium — causal links between anti-patterns, projects, failures |
| Memory admission control | P2 | 1 week | Medium — decide what NOT to remember (paper 2603.04549) |
| CLI decomposition | P2 | 1 week | Medium — cli.py is 5K lines, blocks contributions |
Key research to integrate:
- "Trajectory-Informed Memory Generation" (arXiv 2603.10600) — +14.3pp on AppWorld
- "Adaptive Memory Admission Control" (arXiv 2603.04549) — 5-factor admission scoring
- "AutoSkill" (arXiv 2603.01145) — extract reusable skills from interaction traces
Trajectory memory design sketch:
# New module: intelligence/memory/trajectory_memory.py
@dataclass
class TrajectoryPattern:
"""Learned from a successful task execution."""
task_type: str # "debug", "implement", "refactor"
decision_points: List[DecisionPoint] # where the agent chose
outcome: str # success/partial/failed
key_actions: List[str] # what actually worked
anti_actions: List[str] # what was tried and failed
# Attribution
source_session: str
confidence: float
reuse_count: int = 0
class TrajectoryMemory:
"""Learn from HOW tasks were solved, not just WHAT was discussed."""
def extract_from_session(self, session_log: List[dict]) -> List[TrajectoryPattern]:
"""Analyze a completed session → extract reusable patterns."""
# 1. Identify decision points (tool choices, file selections)
# 2. Attribute outcomes to specific decisions
# 3. Extract generalizable patterns
pass
def suggest_approach(self, task: WorkItem) -> Optional[TrajectoryPattern]:
"""Given a new task, suggest approach based on similar past trajectories."""
# Semantic search over trajectory patterns
# Rank by outcome quality + similarity
passReference architecture: Karpathy's autoresearch (33.5K★, 2026-03-06).
Key pattern to adopt: program.md → agent edits code → train 5min → evaluate val_bpb → keep/discard → repeat.
Cortex adaptation: research_directives.md → CRA proposes integration → batch prototype → evaluate metric → keep/discard → repeat.
Critical constraint borrowed: one scalar metric per experiment cycle (autoresearch uses val_bpb; CRA uses adoption_outcome_score).
| Item | Priority | Effort | Impact |
|---|---|---|---|
| Discovery engine (arxiv, GitHub, MCP registry) | P1 | ✅ SHIPPED | Foundation for everything |
| Analysis engine (relevance scoring, disruption detection) | P1 | ✅ SHIPPED | Filter signal from noise |
| Proposal engine (integration specs) | P2 | 3 days | Actionable output |
Weekly research digest (in cortex briefing) |
P1 | 2 days | User-facing value |
| Batch API integration (overnight research scans) | P2 | ✅ SHIPPED | Cost-efficient |
| Autoresearch-style experiment loop | P1 | 1 week | Autonomous validate/discard cycle for CRA proposals |
| research_directives.md (human-authored CRA program) | P1 | 2 days | Karpathy's program.md pattern — human steers, agent executes |
Autoresearch-inspired experiment loop (NEW — from Karpathy's autoresearch):
The current CRA pipeline is: discover → assess → propose → (human decides). The missing piece is autonomous validation — the agent should be able to prototype an integration, evaluate it against a single metric, and keep/discard without human intervention.
┌──────────────────────────────────────────────────────────────────┐
│ CRA EXPERIMENT LOOP (autoresearch-adapted) │
│ │
│ research_directives.md ──→ CRA Agent ──→ Propose integration │
│ (human steers) │ │ │
│ │ ┌──────────▼──────────────┐ │
│ │ │ Batch prototype │ │
│ │ │ (branch, implement, │ │
│ │ │ run tests) │ │
│ │ └──────────┬──────────────┘ │
│ │ │ │
│ │ ┌──────────▼──────────────┐ │
│ │ │ Evaluate single metric: │ │
│ │ │ adoption_outcome_score │ │
│ │ │ (test_pass_rate × │ │
│ │ │ capability_coverage × │ │
│ │ │ disruption_addressed) │ │
│ │ └──────────┬──────────────┘ │
│ │ │ │
│ ┌────▼───────────────▼────┐ │
│ │ Score improved? │ │
│ │ YES → merge to staging │ │
│ │ NO → discard + log why │ │
│ └─────────────────────────┘ │
│ │ │
│ └──── REPEAT overnight ───────│
└──────────────────────────────────────────────────────────────────┘
Key constraint from autoresearch: One scalar metric (adoption_outcome_score) keeps the loop
tractable. Multi-dimensional evaluation causes the agent to hedge — autoresearch proved that
constraining to val_bpb alone was sufficient for the agent to independently rediscover RMSNorm
and tied embeddings. CRA's equivalent:
def adoption_outcome_score(proposal_result) -> float:
"""Single scalar metric for CRA experiment loop.
Mirrors autoresearch's val_bpb — lower is better there,
higher is better here. Range: 0.0–1.0.
"""
test_pass = proposal_result.tests_passing / proposal_result.tests_total
capability_gain = proposal_result.capability_score_delta # 0-1
disruption_addressed = 1.0 if proposal_result.addresses_threat else 0.0
# Weighted: tests matter most, then capability, then threat response
return (0.5 * test_pass) + (0.3 * capability_gain) + (0.2 * disruption_addressed)Cowork integration assessment:
Anthropic's cowork feature (if available) would enable:
┌──────────────────┐ ┌──────────────────┐
│ Main Dev Session │ │ Research Agent │
│ (Claude Code) │────→│ (Cowork session) │
│ │ │ │
│ "What's new in │ │ Scans arxiv, │
│ agent memory?" │ │ GitHub, MCP │
│ │←────│ │
│ Gets structured │ │ Returns scored │
│ research digest │ │ discoveries │
└──────────────────┘ └──────────────────┘
Without cowork (fallback): CRA writes to ~/.cortex/research/ and results surface through existing MCP tools:
cortex_intelligence("what research is relevant to my current task?")
→ includes recent CRA discoveries in context
Data bridge pattern (works with or without cowork):
# cortex/engines/research_agent.py
RESEARCH_DIR = Path.home() / ".cortex" / "research"
class CRABridge:
"""Bridge between CRA output and Cortex intelligence layer."""
def get_relevant_discoveries(self, task_context: str) -> List[Discovery]:
"""Query CRA knowledge base for task-relevant research."""
discoveries = self._load_recent(days=30)
return self._rank_by_relevance(discoveries, task_context)
def surface_in_briefing(self) -> str:
"""Add research section to daily briefing."""
week = self._load_recent(days=7)
adopt = [d for d in week if d.recommendation == "adopt"]
monitor = [d for d in week if d.recommendation == "monitor"]
return self._format_briefing_section(adopt, monitor)| Item | Priority | Effort | Impact |
|---|---|---|---|
| Mem0 integration (replace file-based with graph memory) | P2 | 2 weeks | Leverage 49K-star infra |
| AMA-Bench evaluation (arXiv 2602.22769) | P1 | 1 week | External credibility |
| Provider memory detection (if Anthropic ships native) | P1 | 1 week | Existential adaptation |
| Multi-user support (team memory sharing) | P3 | 2 weeks | Growth path |
Mem0 integration design:
# Don't replace everything — layer Mem0 under Cortex's intelligence
# Mem0 handles: storage, embedding, graph relationships
# Cortex handles: outcome learning, task routing, anti-patterns, goals
from mem0 import Memory
class CortexMemoryBackend:
"""Pluggable backend: file-based (default) or Mem0."""
def __init__(self, backend="file"):
if backend == "mem0":
self.store = Memory() # Mem0's graph + vector store
else:
self.store = FileMemoryStore() # Current implementation
# Cortex-specific operations layer on top
def store_anti_pattern(self, pattern: AntiPattern):
"""Anti-pattern is a Cortex concept — stored via any backend."""
self.store.add(
messages=[{"role": "system", "content": pattern.serialize()}],
metadata={"type": "anti_pattern", "project": pattern.project},
user_id=self.user_id,
)| Item | Priority | Effort | Impact |
|---|---|---|---|
| Causal retrieval (retrieve by cause, not similarity) | P2 | 3 weeks | Next-gen retrieval |
| Learned forgetting (graceful memory degradation) | P3 | 2 weeks | Long-term health |
| Cross-repo transfer (memory sharing across repos) | P2 | 2 weeks | Portfolio value |
| CRA self-improvement (research agent learns what to scan) | P3 | 1 week | Meta-learning |
| Paper | ArXiv | Why It Matters | When to Integrate |
|---|---|---|---|
| Trajectory-Informed Memory | 2603.10600 | +14.3pp improvement. Directly maps to Cortex's interaction capture | Phase 2 (April) |
| Adaptive Memory Admission | 2603.04549 | 5-factor admission scoring. Cortex stores everything — needs curation | Phase 2 (April) |
| AutoSkill | 2603.01145 | Skills from traces = anti-patterns generalized | Phase 2 (April) |
| MACLA | 2512.18950 | Hierarchical procedural memory + Bayesian selection, 90.3% ALFWorld, 56s build. Frozen LLM + external memory = Cortex's exact architecture. Near-real-time trajectory extraction viable | Phase 2 (April) |
| A-Mem | 2502.12110 | 85-93% token reduction (~1,200 tok/op). Doubles multi-hop reasoning. Benchmark target for Cortex memory efficiency | Phase 2 (April) |
| AMA-Bench | 2602.22769 | First real benchmark for agent memory | Phase 4 (June) |
| RetroAgent | 2603.08561 | Dual intrinsic feedback without external reward | Phase 3 (May) |
| Memory Survey (5 mechanisms) | 2603.07670 | Taxonomy to validate our architecture decisions | Read immediately |
| MAGMA | — | Multi-graph agent memory. Cross-domain knowledge linking via graph structures. Validates Cortex's graph anti-pattern direction | Phase 2 (April) |
| EverMemOS | — | Memory operating system for structured long-horizon reasoning. Architecturally close to Cortex — assess for convergent patterns | Phase 3 (May) |
| TA-Mem | 2603.09297 | Agent autonomously explores memory via tools | Phase 5 (Jul+) |
| Scenario | Probability | Impact | Cortex Response |
|---|---|---|---|
| Anthropic ships native memory API | 60% by Sep 2026 | HIGH — commoditizes basic memory | Pivot to orchestration layer ON TOP of native memory. Anti-patterns + routing remain unique |
| Mem0 adds task orchestration | 20% | HIGH — direct competitor | Ship faster. 49K stars + orchestration = game over for us |
| Context windows reach 10M tokens | 40% by Dec 2026 | MEDIUM — reduces need for memory | Memory still needed for curation, not just storage. 10M tokens of noise < 1K tokens of curated context |
| Claude Code gets built-in learning | 30% by Sep 2026 | VERY HIGH — our exact use case | Pivot to cross-tool layer (not Claude-specific) |
| Cursor ships cross-session memory | 15% by Sep 2026 | MEDIUM — commoditizes orchestration+memory combo | Monitor Cursor's agent mode evolution. If they add persistent memory across worktree sessions, our "orchestration+memory in one system" moat narrows. Hedge: ensure Cortex's anti-pattern + outcome learning layers remain unique |
Hedging strategy: Every Cortex feature should work with ANY LLM agent, not just Claude Code. MCP is the right abstraction layer. If any provider ships native memory, Cortex becomes the intelligence layer on top.
┌─────────────────────────────────────────────────────────┐
│ CORTEX NORTH STAR METRICS │
│ │
│ Adoption: │
│ ├── GitHub stars (target: 100 by Jun, 500 by Sep) │
│ ├── pip installs / week (target: 50 by Jun) │
│ └── Issues filed by non-Jesse users (target: 10 by Jun)│
│ │
│ Quality: │
│ ├── AMA-Bench score (baseline TBD) │
│ ├── Anti-pattern recurrence rate (target: <5%) │
│ └── Model routing accuracy (target: >80% optimal) │
│ │
│ Learning: │
│ ├── Implicit outcomes derived / week (target: 50+) │
│ ├── Outcome→retrieval boost measured improvement │
│ └── CRA discoveries adopted / month (target: 2-3) │
│ │
│ Compounding: │
│ ├── Time-to-productive-session (should decrease) │
│ └── Repeated mistakes (anti-pattern hits, should → 0) │
│ │
│ Research Agent: │
│ ├── Discoveries scanned / week │
│ ├── Assessments generated / week │
│ └── Proposals adopted → outcome (did it actually help?)│
└─────────────────────────────────────────────────────────┘
Week of Mar 12-13 (SHIP WEEK):
├── [x] Outcome-aware retrieval wired
├── [x] CRA discovery engine (engines/research_agent.py, 35 tests)
├── [x] CRA → supervisor intake wired (from_research_agent in discover_all)
├── [x] CRA batch assessment pipeline (CRABatcher in research_batcher.py)
├── [x] ROADMAP updated: 4 papers, 2 threat sources, 1 disruption scenario
├── [x] OSS audit: LAUNCH READY (all 14 categories pass)
├── [ ] git push cortex-oss main:main
├── [ ] Zenodo DOI
├── [ ] Show HN post
└── [ ] Share with beta users
Week of Mar 17-21 (RESEARCH AGENT FOUNDATION):
├── [ ] Read survey paper (2603.07670) — inform all decisions
├── [x] Prototype CRA discovery engine (arxiv RSS + semantic filter) — DONE early
├── [ ] Wire CRA output into cortex briefing (weekly_digest → briefing.py)
├── [ ] Design trajectory memory data model (informed by MACLA paper)
└── [ ] CLI decomposition (cli.py → commands/)
| Date | Decision | Reasoning |
|---|---|---|
| 2026-03-12 | Don't compete on retrieval quality | Mem0/Supermemory have 50K+ stars and dedicated teams. Our BM25+embedding is adequate. Compete on intelligence layer instead |
| 2026-03-12 | Build auto-research agent before Mem0 integration | CRA compounds everything — helps us discover what to integrate and when. Mem0 integration is a point improvement |
| 2026-03-12 | MCP as primary interface (not Claude-specific) | Provider-native memory is coming. MCP abstracts across providers. Reduces lock-in risk |
| 2026-03-12 | Batch API for research scans | 50% cost savings. Research is not latency-sensitive. Fits existing overnight dispatch infrastructure |
cortex/
├── engines/
│ └── research_agent/
│ ├── __init__.py
│ ├── discovery.py # Source scanning (arxiv, GitHub, MCP)
│ ├── analysis.py # Relevance scoring, disruption detection
│ ├── proposal.py # Integration plan generation
│ ├── bridge.py # CRA ↔ Cortex intelligence bridge
│ └── sources/
│ ├── arxiv.py # arxiv RSS + API
│ ├── github.py # Trending repos + topic search
│ ├── mcp_registry.py # MCP server discovery
│ └── hacker_news.py # HN API filtered search
@dataclass
class Discovery:
id: str
source: str # "arxiv", "github", "mcp", "hn"
title: str
url: str
summary: str # 2-3 sentence summary
discovered_at: datetime
relevance_scores: Dict[str, float] # per-capability
raw_metadata: dict
@dataclass
class Assessment:
discovery_id: str
disruption_risk: float # 0-1
adoption_effort: str # trivial/small/medium/large/rewrite
expected_impact: str # incremental/significant/transformative
affected_modules: List[str]
integration_approach: str # 1-paragraph
risks: List[str]
recommendation: str # adopt/monitor/dismiss
reasoning: str
assessed_at: datetime
@dataclass
class Proposal:
assessment_id: str
title: str
spec: str # Golden Spec format markdown
estimated_effort_days: int
success_criteria: List[str]
created_at: datetime
status: str # draft/approved/implementing/shipped/abandoned
outcome: Optional[str] # measured result after shipping# In supervisor/intake.py — add research tasks to work discovery
def discover_from_research() -> List[WorkItem]:
"""Surface CRA proposals as potential work items."""
proposals = CRABridge().get_pending_proposals()
items = []
for p in proposals:
if p.status == "approved":
items.append(WorkItem(
title=f"Research integration: {p.title}",
source="cra",
priority=WorkItemPriority.MEDIUM,
estimated_complexity=p.estimated_effort_days,
))
return items# If cowork is available, CRA exposes MCP resources:
@mcp.resource("cortex://research/discoveries")
def get_recent_discoveries():
"""Last 7 days of CRA discoveries, scored and sorted."""
return CRABridge().get_relevant_discoveries(days=7)
@mcp.resource("cortex://research/proposals")
def get_pending_proposals():
"""Integration proposals awaiting approval."""
return CRABridge().get_pending_proposals()
@mcp.tool("cortex_research_assess")
def assess_topic(topic: str) -> str:
"""On-demand: assess a specific technology/paper for Cortex relevance."""
discovery = Discovery(title=topic, source="manual", ...)
assessment = AnalysisEngine().assess(discovery)
return assessment.to_json()~/.cortex/research/
├── discoveries.jsonl # Append-only discovery log
├── assessments.jsonl # Scored assessments
├── proposals/
│ ├── 2026-03-15_trajectory_memory.md
│ └── 2026-03-22_mem0_integration.md
├── adopted.jsonl # Shipped integrations + outcomes
├── dismissed.jsonl # Rejected with reasoning
└── digest_cache.json # Weekly digest cache
CRA runs via batch queue (overnight), writes to these files. cortex briefing reads them. No cowork dependency required.