Skip to content

Feedback #106

@RichardHightower

Description

@RichardHightower

Agent Brain 2026 Strategic Recommendations

Date: February 4, 2026
Document Version: 1.0
Classification: Technical Architecture & Roadmap


Executive Summary

This document provides a comprehensive analysis of Agent Brain's current architecture and recommends bleeding-edge enhancements aligned with the 2026 state-of-the-art in RAG systems, vector databases, embedding models, and AI agent integration.

Agent Brain has strong foundations: AST-aware code chunking, multi-modal retrieval (BM25/Vector/Graph/Hybrid), and per-project isolation. However, significant opportunities exist to leverage 2026 advances in:

  • Late Interaction Reranking (ColBERTv2) for sub-100ms precision improvements
  • Streaming Vector Updates (LiveVectorLake architecture) for real-time indexing
  • Voyage 4 Embeddings outperforming OpenAI by 14%
  • Native MCP Integration eliminating the plugin→CLI→server latency chain
  • Agentic GraphRAG with LlamaIndex Workflows for multi-step reasoning

Table of Contents

  1. Current State Assessment
  2. 2026 Technology Landscape
  3. Strategic Recommendations
  4. Implementation Roadmap
  5. Architecture Evolution
  6. Risk Analysis
  7. Sources & References

1. Current State Assessment

1.1 Architectural Strengths

Component Implementation Assessment
AST-Aware Chunking tree-sitter for 9 languages Industry-leading approach
Multi-Modal Retrieval 5 modes with RSF/RRF fusion Comprehensive coverage
Per-Project Isolation Separate servers, auto-port allocation Clean architecture
Provider Abstraction OpenAI/Ollama/Cohere/Anthropic Good extensibility
LlamaIndex Foundation BM25Retriever, PropertyGraphIndex Solid primitives

1.2 Critical Gaps Identified

Technical Debt (34 Pending Tasks)

GraphRAG Implementation:
├── T017-T029: Graph query mode - NOT STARTED
├── T030-T042: Multi-mode fusion - PARTIAL
├── T043-T047: AST-based code relationships - NOT STARTED
└── Kuzu backend support - NOT STARTED

Pluggable Providers:
├── T047-T052: Offline operation (Ollama) - NOT STARTED
├── T053-T058: API key security - NOT STARTED
└── T063-T067: Provider mismatch detection - NOT STARTED

Multi-Instance:
├── T059-T061: Integration tests - NOT STARTED
└── T062-T067: Shared daemon mode - NOT STARTED

Performance Bottlenecks

Issue Impact Current State
Embedding Generation 50-90% of indexing time Sequential batch processing
BM25 Post-Filtering 3x over-fetch, unvalidated No native metadata filtering
Graph Memory Limit ~100K triplets max SimplePropertyGraphStore in RAM
No Query Caching Repeated queries recomputed No LRU cache
Blocking Indexing 409 errors during index Single-threaded, no queue

Testing Coverage Gaps

Area Status Risk
GraphRAG E2E 0% HIGH - Feature non-functional
Provider E2E ~40% MEDIUM - 5 providers untested
Multi-instance E2E 0% HIGH - Isolation unvalidated
Performance benchmarks 0% MEDIUM - No baseline metrics

2. 2026 Technology Landscape

2.1 RAG State-of-the-Art: Two-Stage Retrieval with Late Interaction

The industry has converged on two-stage RAG architectures combining:

  1. Stage 1: Fast Retrieval - BM25/SPLADE + Vector search (high recall)
  2. Stage 2: Precision Reranking - ColBERTv2 late interaction (high precision)

ColBERTv2 Performance (January 2026 Research)

"On PubMedQA, ColBERTv2 re-ranking yields up to +4.2 pp gain in Recall@3 and +3.13 pp average accuracy improvement when fine-tuned with in-batch negatives."

"Inference latency is approximately 31.4 ms for query encoding and 26.3 ms for re-ranking, totaling 57.7 ms per query. Sub-100ms latency enables interactive applications."

How Late Interaction Works:

Traditional Bi-Encoder:       Late Interaction (ColBERT):
Query → [CLS] embedding       Query → [token₁, token₂, ..., tokenₙ] embeddings
Doc   → [CLS] embedding       Doc   → [token₁, token₂, ..., tokenₘ] embeddings
Score = cosine(q, d)          Score = Σ max(sim(qᵢ, dⱼ)) for all i  (MaxSim)

ColBERT precomputes document token embeddings offline, enabling fast scoring at query time while maintaining token-level precision.

Recommended Pipeline for Agent Brain

┌─────────────────────────────────────────────────────────────────┐
│                    Agent Brain Query Pipeline                    │
├─────────────────────────────────────────────────────────────────┤
│  Query                                                          │
│    │                                                            │
│    ▼                                                            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Stage 1: Candidate Retrieval (top_k=100)                │   │
│  │  ├── BM25 (keyword precision)                           │   │
│  │  ├── Vector (semantic recall)                           │   │
│  │  └── Graph (relationship traversal)                     │   │
│  │  → Reciprocal Rank Fusion                               │   │
│  └─────────────────────────────────────────────────────────┘   │
│    │                                                            │
│    ▼ ~50 candidates                                             │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Stage 2: ColBERTv2 Reranking (top_k=10)                 │   │
│  │  └── Token-level MaxSim scoring                         │   │
│  │  → Sub-100ms latency                                    │   │
│  └─────────────────────────────────────────────────────────┘   │
│    │                                                            │
│    ▼ Final results                                              │
└─────────────────────────────────────────────────────────────────┘

2.2 Vector Database Evolution

2026 Benchmark Comparison

Database QPS @ 50M vectors Latency (p50) Billions Scale Local Option
Qdrant 41.47 QPS @ 99% recall 20-50ms Yes Yes
pgvectorscale 471 QPS 10-20ms No (10-100M max) Yes
Milvus/Zilliz Best-in-class <10ms Yes Yes
Pinecone Enterprise-grade 20-50ms Yes No
ChromaDB (current) Not benchmarked Variable No Yes

Key Insight: pgvector with pgvectorscale now outperforms Qdrant for workloads under 100M vectors, while providing PostgreSQL's full-text search (replacing BM25) and ACID transactions.

Recommendation: Prioritize Phase 6 (PostgreSQL Backend)

-- Single PostgreSQL instance replaces 3 storage backends:
-- 1. pgvector for vector similarity (replaces ChromaDB)
-- 2. tsvector for full-text search (replaces BM25)
-- 3. JSONB for graph storage (replaces SimplePropertyGraphStore)

CREATE TABLE chunks (
    id UUID PRIMARY KEY,
    content TEXT,
    embedding vector(3072),
    content_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED,
    metadata JSONB,
    graph_triplets JSONB
);

CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON chunks USING gin (content_tsv);
CREATE INDEX ON chunks USING gin (graph_triplets jsonb_path_ops);

2.3 Embedding Models: Voyage 4 Dominance

2026 Embedding Benchmark Results

Model Relative Performance Cost Best For
Voyage 4-large Baseline (+0%) $$$ Maximum accuracy
Voyage 4 -1.87% $$ Production balance
Voyage 3.5-lite -4.80% $ Cost-effective RAG
Gemini Embedding 001 -3.87% $$ Google ecosystem
Cohere Embed v4 -8.20% $$ Multilingual
OpenAI v3 Large -14.05% $$ Legacy compatibility

Critical Finding: Agent Brain's current default (OpenAI text-embedding-3-large) is 14% less accurate than Voyage 4-large.

Specialized Code Embeddings

For code-specific retrieval, consider GraphCodeBERT which:

  • Encodes semantic-level structure via data flow graphs
  • Captures "where-the-value-comes-from" relationships between variables
  • Pre-trained on 6 programming languages
GraphCodeBERT Architecture:
┌─────────────────────────────────────────────┐
│ Code: def foo(x): return x + 1              │
│                                             │
│ Token Embedding  +  Data Flow Graph         │
│ [def][foo][x]...    x ──defines──> param_x  │
│                     return ──uses──> x      │
│                                             │
│ → Semantic-aware code representation        │
└─────────────────────────────────────────────┘

2.4 GraphRAG: Microsoft's Production Architecture

Microsoft's GraphRAG (now in Azure Discovery) uses:

  1. LLM Entity Extraction - Extract named entities and descriptions from text chunks
  2. Hierarchical Leiden Clustering - Form semantic communities in the graph
  3. Community Summarization - LLM-generated summaries for each cluster
  4. Query-Focused Synthesis - Traverse graph + summaries at query time

LlamaIndex Integration (Agentic GraphRAG)

# LlamaIndex 2026 PropertyGraph + Agentic Workflow
from llama_index.core import PropertyGraphIndex
from llama_index.core.workflow import Workflow, step

class AgenticGraphRAG(Workflow):
    @step
    async def retrieve(self, query: str) -> list[Node]:
        # Stage 1: Multi-modal retrieval
        vector_results = await self.vector_index.aretrieve(query)
        graph_results = await self.graph_index.aretrieve(query)
        return self.fuse_rrf(vector_results, graph_results)

    @step
    async def reflect(self, results: list[Node]) -> ReflectionOutput:
        # Stage 2: Agent reflection - are results sufficient?
        return await self.llm.areflect(results, self.query)

    @step
    async def synthesize(self, results: list[Node]) -> str:
        # Stage 3: Generate answer with citations
        return await self.llm.asynthesize(results)

2.5 Real-Time Indexing: LiveVectorLake Architecture

The LiveVectorLake paper (January 2026) introduces a production architecture for streaming vector updates:

LiveVectorLake Architecture:
┌─────────────────────────────────────────────────────────────────┐
│                    Change Detection Layer                        │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐            │
│  │ File Watcher │ │ Git Hooks    │ │ DB Triggers  │            │
│  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘            │
│         └────────────────┼────────────────┘                     │
│                          ▼                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Content-Addressable Hashing                 │   │
│  │   SHA256(chunk_content) → embedding_cache               │   │
│  │   Skip embedding if hash exists (50-80% speedup)        │   │
│  └─────────────────────────────────────────────────────────┘   │
│                          ▼                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Dual-Tier Storage                           │   │
│  │   Hot Tier: In-memory for recent changes                │   │
│  │   Cold Tier: Persistent for historical data             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                          ▼                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              ACID Transactions                           │   │
│  │   Atomic index updates                                  │   │
│  │   Consistent query results during updates               │   │
│  │   Isolated concurrent access                            │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Performance Results:
- 10-15% content re-processing during updates (vs 100% for full re-index)
- Sub-100ms query latency during indexing
- 100% temporal query accuracy

2.6 MCP Native Integration

The Model Context Protocol has evolved significantly:

  • 75+ connectors in Claude's directory
  • MCP Apps for interactive UIs within Claude
  • Tool Search for optimizing 1000s of tools at scale
  • Donated to Agentic AI Foundation (Linux Foundation) in December 2025

Current Agent Brain Integration

Plugin → subprocess → CLI → HTTP → Server
       ↑                          ↑
   ~50-100ms latency          ~10-20ms latency

Recommended MCP Native Integration

Claude ←──MCP──→ Agent Brain Server
              ↑
         ~5-10ms latency (direct protocol)

3. Strategic Recommendations

3.1 Critical Priority (P0) - Complete Before Production

R1: Complete GraphRAG Implementation

Current State: Foundation done, query execution not implemented
Tasks: T017-T029 (Graph queries), T043-T047 (AST code relationships)
Effort: ~120 hours
Impact: Unlocks the "what calls this function" use case - a core differentiator

Implementation Approach:

# Use LlamaIndex's PropertyGraphIndex with LLM extraction
from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.core.indices.property_graph.extractors import (
    ImplicitPathExtractor,
    SimpleLLMPathExtractor,
)

# For code, augment with AST-derived relationships
class CodeRelationshipExtractor:
    def extract(self, code_chunk: CodeChunk) -> list[GraphTriple]:
        relationships = []
        # From AST metadata already extracted by CodeChunker
        for import_stmt in code_chunk.imports:
            relationships.append(GraphTriple(
                subject=code_chunk.symbol_name,
                predicate="imports",
                object=import_stmt
            ))
        for call in code_chunk.function_calls:
            relationships.append(GraphTriple(
                subject=code_chunk.symbol_name,
                predicate="calls",
                object=call
            ))
        return relationships

R2: Implement Embedding Cache with Content Hashing

Current State: Every re-index regenerates all embeddings
Expected Improvement: 50-80% reduction in indexing time
Effort: ~40 hours

Implementation:

import hashlib
from pathlib import Path

class EmbeddingCache:
    def __init__(self, cache_dir: Path):
        self.cache_dir = cache_dir
        self.cache_dir.mkdir(exist_ok=True)

    def get_or_compute(self, content: str, embed_fn: Callable) -> list[float]:
        content_hash = hashlib.sha256(content.encode()).hexdigest()
        cache_file = self.cache_dir / f"{content_hash}.npy"

        if cache_file.exists():
            return np.load(cache_file).tolist()

        embedding = embed_fn(content)
        np.save(cache_file, np.array(embedding))
        return embedding

R3: Add ColBERTv2 Reranking Stage

Current State: Single-stage retrieval only
Expected Improvement: +3-4% accuracy, sub-100ms additional latency
Effort: ~60 hours

Implementation Options:

  1. RAGatouille - Python library wrapping ColBERTv2
  2. Jina Reranker API - Hosted ColBERT-style reranking
  3. Self-hosted ColBERTv2 - Maximum control
from ragatouille import RAGPretrainedModel

class TwoStageRetriever:
    def __init__(self):
        self.colbert = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

    async def retrieve(self, query: str, top_k: int = 10) -> list[Result]:
        # Stage 1: Fast retrieval (existing)
        candidates = await self.hybrid_retrieve(query, top_k=100)

        # Stage 2: ColBERT reranking
        docs = [c.content for c in candidates]
        reranked = self.colbert.rerank(query, docs, k=top_k)

        return [candidates[i] for i, _ in reranked]

3.2 High Priority (P1) - Next Release Cycle

R4: Upgrade Default Embedding Provider to Voyage 4

Current State: OpenAI text-embedding-3-large (14% less accurate)
Expected Improvement: +14% retrieval accuracy
Effort: ~20 hours (provider already pluggable)

# config.yaml
embedding:
  provider: voyage
  model: voyage-4-large  # or voyage-3.5-lite for cost-effective
  dimensions: 1024

R5: Implement Native MCP Server

Current State: Plugin → subprocess → CLI → HTTP → Server
Expected Improvement: 5-10x latency reduction, simplified architecture
Effort: ~80 hours

from mcp import Server, Tool

class AgentBrainMCPServer(Server):
    @Tool
    async def search(self, query: str, mode: str = "hybrid") -> list[dict]:
        """Search the indexed knowledge base."""
        return await self.query_service.execute_query(
            QueryRequest(query=query, mode=mode)
        )

    @Tool
    async def index(self, path: str, include_code: bool = True) -> dict:
        """Index documents and code."""
        return await self.indexing_service.start_indexing(
            IndexRequest(folder_path=path, include_code=include_code)
        )

R6: Background Indexing Queue

Current State: Indexing blocks server, returns 409 for concurrent requests
Expected Improvement: Non-blocking indexing, query during index
Effort: ~60 hours

from asyncio import Queue
from dataclasses import dataclass

@dataclass
class IndexJob:
    job_id: str
    request: IndexRequest
    priority: int = 0

class BackgroundIndexer:
    def __init__(self):
        self.queue: Queue[IndexJob] = Queue()
        self.current_job: IndexJob | None = None

    async def enqueue(self, request: IndexRequest) -> str:
        job = IndexJob(job_id=uuid4().hex, request=request)
        await self.queue.put(job)
        return job.job_id

    async def process_queue(self):
        while True:
            self.current_job = await self.queue.get()
            await self._run_indexing(self.current_job)
            self.current_job = None

R7: File Watcher for Auto-Indexing

Current State: Manual re-index required after code changes
Expected Improvement: Zero-friction index maintenance
Effort: ~40 hours

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class AutoIndexer(FileSystemEventHandler):
    def __init__(self, indexer: BackgroundIndexer, debounce_ms: int = 5000):
        self.indexer = indexer
        self.debounce_ms = debounce_ms
        self.pending_files: set[Path] = set()

    def on_modified(self, event):
        if self._should_index(event.src_path):
            self.pending_files.add(Path(event.src_path))
            self._schedule_debounced_index()

    async def _schedule_debounced_index(self):
        await asyncio.sleep(self.debounce_ms / 1000)
        files = self.pending_files.copy()
        self.pending_files.clear()
        await self.indexer.enqueue_incremental(files)

3.3 Medium Priority (P2) - Q2 2026

R8: PostgreSQL Backend (Consolidate Storage)

Current State: 3 separate storage systems (ChromaDB, BM25, SimplePropertyGraphStore)
Expected Improvement: Unified storage, ACID transactions, better scaling
Effort: ~160 hours

Benefits of PostgreSQL Consolidation:

  1. pgvector - Vector similarity with HNSW indexing
  2. tsvector - Native full-text search (replaces BM25)
  3. JSONB - Graph storage with path queries
  4. ACID - Transactional consistency during updates
  5. Scaling - Well-understood operational model to 100M+ vectors

R9: Agentic GraphRAG with LlamaIndex Workflows

Current State: Static query execution
Expected Improvement: Multi-step reasoning, self-correction
Effort: ~80 hours

from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent

class AgenticRAGWorkflow(Workflow):
    @step
    async def retrieve(self, ev: StartEvent) -> RetrieveEvent:
        results = await self.retriever.aretrieve(ev.query)
        return RetrieveEvent(results=results)

    @step
    async def evaluate(self, ev: RetrieveEvent) -> EvaluateEvent:
        # LLM judges if results are sufficient
        judgment = await self.llm.ajudge_relevance(ev.results, self.query)
        if judgment.needs_more_context:
            # Reformulate query and retrieve again
            return RetrieveEvent(query=judgment.reformulated_query)
        return EvaluateEvent(results=ev.results, sufficient=True)

    @step
    async def synthesize(self, ev: EvaluateEvent) -> StopEvent:
        answer = await self.llm.asynthesize(ev.results)
        return StopEvent(result=answer)

R10: Code-Specific Embedding Model

Current State: Generic text embeddings for code
Expected Improvement: Better code search accuracy
Effort: ~40 hours

Options:

  1. Voyage Code - Specialized code embedding model
  2. GraphCodeBERT - Open source, data-flow aware
  3. StarCoder Embeddings - 80+ languages, 15B parameters
# config.yaml - per source_type embedding
embedding:
  document:
    provider: voyage
    model: voyage-4-large
  code:
    provider: voyage
    model: voyage-code-3

3.4 Lower Priority (P3) - Q3-Q4 2026

R11: LiveVectorLake-Style Streaming Updates

Implement the full streaming architecture from the LiveVectorLake paper:

  • Content-addressable hashing (R2 is first step)
  • Dual-tier storage (hot/cold)
  • ACID transactions during updates
  • Temporal queries ("what was indexed yesterday?")

R12: Multi-Repository Federated Search

Enable searching across multiple projects simultaneously:

  • Shared daemon mode (already spec'd as Phase 5)
  • Cross-project RRF fusion
  • Organization-wide code search

R13: VS Code Extension

Native IDE integration:

  • Sidebar search panel
  • Inline results with code preview
  • "Find in Knowledge Base" command
  • Hover documentation from indexed docs

R14: Query Explanation and Debugging

Help users understand search results:

  • Score breakdown (vector_score, bm25_score, graph_score)
  • Matching term highlighting
  • Entity path visualization for graph results
  • explain=true query parameter

4. Implementation Roadmap

Phase 1: Foundation Fixes (February-March 2026)

Week Deliverable Owner Dependencies
1-2 Embedding Cache (R2) Core None
2-3 Complete GraphRAG queries (R1 partial) Core None
3-4 ColBERTv2 Reranking (R3) Core None
4 Integration tests for all above QA R1, R2, R3

Exit Criteria:

  • Incremental indexing 50%+ faster
  • Graph queries functional end-to-end
  • Reranking improves top-5 precision by 3%+

Phase 2: Performance & Integration (April-May 2026)

Week Deliverable Owner Dependencies
1-2 Voyage 4 embedding upgrade (R4) Core None
2-4 Native MCP Server (R5) Core None
3-4 Background indexing queue (R6) Core None
4 File watcher auto-index (R7) Core R6

Exit Criteria:

  • 14% accuracy improvement from Voyage 4
  • Sub-20ms query latency via MCP
  • Non-blocking indexing with progress streaming

Phase 3: Architecture Evolution (June-August 2026)

Week Deliverable Owner Dependencies
1-4 PostgreSQL backend (R8) Core None
4-6 Agentic GraphRAG workflows (R9) Core R1, R8
6-8 Code-specific embeddings (R10) Core R4

Exit Criteria:

  • Single PostgreSQL instance replaces 3 storage backends
  • Multi-step agentic queries functional
  • Code search accuracy improved by measured benchmark

Phase 4: Polish & Extensions (September-December 2026)

Deliverable Priority Effort
LiveVectorLake streaming (R11) P3 120h
Multi-repo federation (R12) P3 80h
VS Code extension (R13) P3 120h
Query explanation (R14) P3 40h

5. Architecture Evolution

Current Architecture (v1.2.0)

┌─────────────────────────────────────────────────────────────────┐
│                        Claude Code                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Plugin (Markdown) → subprocess → CLI → HTTP → Server    │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                     agent-brain-server                           │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐            │
│  │   ChromaDB   │ │  BM25 JSON   │ │ SimpleGraph  │            │
│  │   (vectors)  │ │  (keywords)  │ │   (in RAM)   │            │
│  └──────────────┘ └──────────────┘ └──────────────┘            │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                    External Providers                            │
│  OpenAI (embeddings) │ Anthropic (summaries) │ Ollama (local)   │
└─────────────────────────────────────────────────────────────────┘

Target Architecture (v2.0.0 - Q4 2026)

┌─────────────────────────────────────────────────────────────────┐
│                        Claude Code                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Native MCP Integration (direct protocol, sub-20ms)      │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                   agent-brain-server v2                          │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Agentic Query Workflow                      │    │
│  │  Retrieve → Reflect → Rerank (ColBERTv2) → Synthesize   │    │
│  └─────────────────────────────────────────────────────────┘    │
│                              ↓                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Unified PostgreSQL Backend                  │    │
│  │  pgvector (HNSW) │ tsvector (FTS) │ JSONB (graph)       │    │
│  │  + Embedding Cache (content-addressable)                │    │
│  └─────────────────────────────────────────────────────────┘    │
│                              ↓                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Background Processing                       │    │
│  │  File Watcher → Job Queue → Incremental Indexer         │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                    Embedding Providers                           │
│  Voyage 4 (default) │ GraphCodeBERT (code) │ Ollama (offline)   │
└─────────────────────────────────────────────────────────────────┘

Key Architectural Changes

Aspect Current Target Benefit
Protocol HTTP via subprocess Native MCP 5-10x latency reduction
Storage 3 separate systems Unified PostgreSQL ACID, simpler ops
Retrieval Single-stage Two-stage + agentic +3-4% accuracy
Embeddings OpenAI only Voyage 4 + code-specific +14% accuracy
Indexing Blocking, full re-index Background, incremental 50-80% faster
Queries Static execution Agentic workflows Multi-step reasoning

6. Risk Analysis

Technical Risks

Risk Probability Impact Mitigation
PostgreSQL migration breaks existing indexes Medium High Implement migration tool, keep ChromaDB fallback
ColBERTv2 adds unacceptable latency Low Medium Make reranking optional, benchmark first
Voyage 4 API stability Low Medium Keep OpenAI as fallback provider
MCP protocol changes Medium Medium Pin MCP version, abstract integration

Operational Risks

Risk Probability Impact Mitigation
Team bandwidth for all recommendations High High Prioritize P0/P1, defer P2/P3
Breaking changes for existing users Medium High Semantic versioning, migration guides
Increased infrastructure complexity Medium Medium PostgreSQL actually simplifies (1 vs 3 systems)

Dependency Risks

Risk Probability Impact Mitigation
LlamaIndex breaking changes Medium Medium Pin versions, maintain fork if needed
Voyage AI pricing/availability Low Medium Pluggable provider architecture
ColBERTv2 model updates Low Low Version-pin model, benchmark updates

7. Sources & References

RAG & Reranking

Vector Databases

Embedding Models

GraphRAG

MCP & Integration

Code Embeddings

Real-Time Indexing

LlamaIndex & Agentic RAG


Appendix A: Quick Reference - Priority Matrix

                        IMPACT
                 Low    Medium    High
              ┌────────┬────────┬────────┐
         Low  │        │  R14   │        │
              ├────────┼────────┼────────┤
EFFORT Medium │  R4    │ R7,R10 │ R3,R6  │
              ├────────┼────────┼────────┤
        High  │ R11-13 │  R9    │R1,R2,R5│
              │        │        │   R8   │
              └────────┴────────┴────────┘

Priority:
- P0 (Critical): R1, R2, R3
- P1 (High): R4, R5, R6, R7
- P2 (Medium): R8, R9, R10
- P3 (Lower): R11, R12, R13, R14

Appendix B: Estimated Resource Requirements

Phase Duration Engineering Hours Infrastructure
Phase 1 8 weeks 280 hours None (existing)
Phase 2 8 weeks 300 hours MCP test environment
Phase 3 12 weeks 400 hours PostgreSQL instance
Phase 4 16 weeks 360 hours VS Code marketplace
Total 44 weeks 1,340 hours

Document prepared by Claude Opus 4.5 based on comprehensive analysis of Agent Brain wiki documentation and 2026 state-of-the-art research.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions