Skip to content

Latest commit

 

History

History
448 lines (366 loc) · 14.3 KB

File metadata and controls

448 lines (366 loc) · 14.3 KB

MCP Memory Service Architecture

Overview

MCP Memory Service is a Model Context Protocol server that provides semantic memory and persistent storage capabilities for AI assistants. It enables long-term memory storage with semantic search, time-based recall, and tag-based organization across conversations.

System Architecture

graph TB
    subgraph "Client Layer"
        CC[Claude Desktop]
        LMS[LM Studio]
        VSC[VS Code MCP]
        GEN[Generic MCP Client]
    end

    subgraph "Protocol Layer"
        MCP[MCP Server Protocol]
        HTTP[HTTP API Server]
        WEB[Web Dashboard]
    end

    subgraph "Core Services"
        SRV[Memory Service Core]
        AUTH[Authentication]
        CACHE[Model Cache]
        EMB[Embedding Service]
    end

    subgraph "Storage Abstraction"
        ABS[Storage Interface]
        HYBRID[Hybrid Backend ⭐]
        CLOUDFLARE[Cloudflare Backend]
        SQLITE[SQLite-vec Backend]
        REMOTE[HTTP Client Backend]
        CHROMA[ChromaDB ⚠️ DEPRECATED]
    end

    subgraph "Infrastructure"
        DB[(Vector Database)]
        FS[(File System)]
        MDNS[mDNS Discovery]
    end

    CC --> MCP
    LMS --> MCP
    VSC --> MCP
    GEN --> MCP
    
    MCP --> SRV
    HTTP --> SRV
    WEB --> HTTP
    
    SRV --> AUTH
    SRV --> CACHE
    SRV --> EMB
    SRV --> ABS
    
    ABS --> HYBRID
    ABS --> CLOUDFLARE
    ABS --> SQLITE
    ABS --> REMOTE
    ABS --> CHROMA

    HYBRID --> SQLITE
    HYBRID --> CLOUDFLARE
    CLOUDFLARE --> DB
    SQLITE --> DB
    REMOTE --> HTTP
    CHROMA --> DB
    
    DB --> FS
    SRV --> MDNS
Loading

Core Components

1. Server Layer (src/mcp_memory_service/server.py)

The main server implementation that handles MCP protocol communication:

  • Protocol Handler: Implements the MCP protocol specification
  • Request Router: Routes incoming requests to appropriate handlers
  • Response Builder: Constructs protocol-compliant responses
  • Client Detection: Identifies and adapts to different MCP clients (Claude Desktop, LM Studio, etc.)
  • Logging System: Client-aware logging with JSON compliance for Claude Desktop

Key responsibilities:

  • Async request handling with proper error boundaries
  • Global model and embedding cache management
  • Lazy initialization of storage backends
  • Tool registration and invocation

2. Storage Abstraction Layer (src/mcp_memory_service/storage/)

Abstract interface that allows multiple storage backend implementations:

Base Interface (storage/base.py)

class MemoryStorage(ABC):
    async def initialize(self) -> None:
        """Initialize the storage backend."""
        pass

    async def store(self, memory: Memory) -> Tuple[bool, str]:
        """Store a memory object."""
        pass

    async def retrieve(self, query: str, n_results: int) -> List[MemoryQueryResult]:
        """Retrieve memories based on semantic similarity."""
        pass

    async def search_by_tag(self, tags: List[str]) -> List[Memory]:
        """Search memories by tags."""
        pass

    async def delete(self, content_hash: str) -> Tuple[bool, str]:
        """Delete a memory by content hash."""
        pass

    async def recall_memory(self, query: str, n_results: int) -> List[Memory]:
        """Recall memories using natural language time queries."""
        pass

Hybrid Backend (storage/hybrid.py) ⭐ RECOMMENDED

  • Production default - Best performance with cloud synchronization
  • Primary storage: SQLite-vec for ultra-fast local reads (~5ms)
  • Secondary storage: Cloudflare for multi-device persistence and cloud backup
  • Background sync: Zero user-facing latency with async operation queue
  • Graceful degradation: Works offline, automatically syncs when cloud available
  • Capacity monitoring: Tracks Cloudflare limits and provides warnings
  • Use cases: Production deployments, multi-device users, cloud-backed local performance

Cloudflare Backend (storage/cloudflare.py)

  • Cloud-native storage using Cloudflare D1 (SQL) + Vectorize (vectors)
  • Global edge distribution for low-latency access worldwide
  • Serverless architecture with no infrastructure management
  • Automatic scaling and high availability
  • Limits: 10GB D1 database, 5M vectors in Vectorize
  • Use cases: Cloud-only deployments, serverless environments, no local storage

SQLite-vec Backend (storage/sqlite_vec.py)

  • Lightweight, fast local storage (5ms read latency)
  • Native SQLite with vec0 extension for vector similarity
  • ONNX Runtime embeddings (no PyTorch dependency)
  • Minimal memory footprint and dependencies
  • Use cases: Development, single-device deployments, or as primary in Hybrid backend

HTTP Client Backend (storage/http_client.py)

  • Remote storage via HTTP API for distributed architectures
  • Enables client-server deployments with centralized memory
  • Bearer token authentication with API key support
  • Automatic retry logic with exponential backoff
  • Use cases: Multi-client shared memory, remote MCP servers, load balancing

ChromaDB Backend (storage/chroma.py) ⚠️ DEPRECATED

  • Status: Deprecated since v5.x, removal planned for v6.0.0
  • Migration path: Switch to Hybrid backend for production
  • Original vector database backend with sentence transformer embeddings
  • Heavy dependencies (PyTorch, sentence-transformers, ~2GB download)
  • Slower performance (15ms vs 5ms for SQLite-vec)
  • Higher memory footprint and complexity
  • Why deprecated: Hybrid backend provides better performance with cloud sync
  • Historical only: Not recommended for new deployments

3. Models Layer (src/mcp_memory_service/models/)

Data structures and validation:

@dataclass
class Memory:
    id: str
    content: str
    content_hash: str
    memory_type: str
    tags: List[str]
    metadata: MemoryMetadata
    created_at: datetime
    updated_at: datetime

@dataclass
class MemoryMetadata:
    source: Optional[str]
    client_id: Optional[str]
    session_id: Optional[str]
    parent_memory_id: Optional[str]
    child_memory_ids: List[str]

4. Web Interface (src/mcp_memory_service/web/)

Modern web dashboard for memory management:

  • Frontend: Responsive React-based UI
  • API Routes: RESTful endpoints for memory operations
  • WebSocket Support: Real-time updates
  • Authentication: API key-based authentication
  • Health Monitoring: System status and metrics

5. Configuration Management (src/mcp_memory_service/config.py)

Environment-based configuration with sensible defaults:

  • Storage backend selection
  • Model selection and caching
  • Platform-specific optimizations
  • Hardware acceleration detection (CUDA, MPS, DirectML, ROCm)
  • Network configuration (HTTP, HTTPS, mDNS)

Key Design Patterns

Async/Await Pattern

All I/O operations use Python's async/await for non-blocking execution:

async def store_memory(self, content: str) -> Memory:
    embedding = await self._generate_embedding(content)
    memory = await self.storage.store(content, embedding)
    return memory

Lazy Initialization

Resources are initialized only when first needed:

async def _ensure_storage_initialized(self):
    if self.storage is None:
        self.storage = await create_storage_backend()
    return self.storage

Global Caching Strategy

Model and embedding caches are shared globally to reduce memory usage:

_MODEL_CACHE = {}
_EMBEDDING_CACHE = LRUCache(maxsize=1000)

Platform Detection and Optimization

Automatic detection and optimization for different platforms:

  • macOS: MPS acceleration for Apple Silicon
  • Windows: CUDA or DirectML
  • Linux: CUDA, ROCm, or CPU
  • Fallback: ONNX Runtime for compatibility

MCP Protocol Operations

Core Memory Operations

Operation Description Parameters
store_memory Store new memory with tags content, tags, metadata
retrieve_memory Semantic search query, n_results
recall_memory Time-based retrieval time_expression, n_results
search_by_tag Tag-based search tags[]
delete_memory Delete by hash content_hash
delete_by_tags Bulk deletion tags[]

Utility Operations

Operation Description Parameters
check_database_health Health status -
optimize_db Database optimization -
export_memories Export to JSON output_path
import_memories Import from JSON input_path
get_memory_stats Usage statistics -

Debug Operations

Operation Description Parameters
debug_retrieve Detailed similarity scores query, n_results
exact_match_retrieve Exact content matching query

Data Flow

Memory Storage Flow

1. Client sends store_memory request
2. Server validates and enriches metadata
3. Content is hashed for deduplication
4. Text is embedded using sentence transformers
5. Memory is stored in vector database
6. Confirmation returned to client

Memory Retrieval Flow

1. Client sends retrieve_memory request
2. Query is embedded to vector representation
3. Vector similarity search performed
4. Results ranked by similarity score
5. Metadata enriched results returned

Time-Based Recall Flow

1. Client sends recall_memory with time expression
2. Time parser extracts temporal boundaries
3. Semantic query combined with time filter
4. Filtered results returned chronologically

Performance Optimizations

Model Caching

  • Sentence transformer models cached globally
  • Single model instance shared across requests
  • Lazy loading on first use

Embedding Cache

  • LRU cache for frequently used embeddings
  • Configurable cache size
  • Cache hit tracking for optimization

Query Optimization

  • Batch processing for multiple operations
  • Connection pooling for database access
  • Async I/O for non-blocking operations

Platform-Specific Optimizations

  • Hardware acceleration auto-detection
  • Optimized tensor operations per platform
  • Fallback strategies for compatibility

Security Considerations

Authentication

  • API key-based authentication for HTTP endpoints
  • Bearer token support
  • Per-client authentication in multi-client mode

Data Privacy

  • Content hashing for deduplication
  • Optional encryption at rest
  • Client isolation in shared deployments

Network Security

  • HTTPS support with SSL/TLS
  • CORS configuration for web access
  • Rate limiting for API endpoints

Deployment Architectures

Production (Hybrid Backend) ⭐ RECOMMENDED

  • Local performance: SQLite-vec for 5ms read latency
  • Cloud persistence: Cloudflare for multi-device sync and backup
  • Background sync: Zero user-facing latency, async operation queue
  • Offline capability: Full functionality without internet, syncs when available
  • Multi-device: Access same memories across desktop, laptop, mobile
  • Use cases: Individual users, teams with personal instances, production deployments
  • Setup: install.py --storage-backend hybrid or set MCP_MEMORY_STORAGE_BACKEND=hybrid

Cloud-Only (Cloudflare Backend)

  • Serverless deployment: No local storage, pure cloud architecture
  • Global edge: Cloudflare's worldwide network for low latency
  • Automatic scaling: Handles traffic spikes without configuration
  • Use cases: Serverless environments, ephemeral containers, CI/CD systems
  • Limits: 10GB D1 database, 5M vectors in Vectorize
  • Setup: install.py --storage-backend cloudflare or set MCP_MEMORY_STORAGE_BACKEND=cloudflare

Development (SQLite-vec Backend)

  • Lightweight: Minimal dependencies, fast startup
  • Local-only: No cloud connectivity required
  • Fast iteration: 5ms read latency, no sync overhead
  • Use cases: Development, testing, single-device prototypes
  • Setup: install.py --storage-backend sqlite_vec or set MCP_MEMORY_STORAGE_BACKEND=sqlite_vec

Multi-Client Shared (HTTP Server)

  • Centralized HTTP server with shared memory pool
  • Multiple clients connect via API (Claude Desktop, VS Code, custom apps)
  • Authentication: API key-based access control
  • Use cases: Team collaboration, shared organizational memory
  • Setup: Enable HTTP server with MCP_HTTP_ENABLED=true, clients use HTTP Client backend

Legacy (ChromaDB Backend) ⚠️ NOT RECOMMENDED

  • Deprecated: Removal planned for v6.0.0
  • Migration required: Switch to Hybrid backend
  • Heavy dependencies, slower performance (15ms vs 5ms)
  • Only for existing deployments with migration path to Hybrid

Extension Points

Custom Storage Backends

Implement the MemoryStorage abstract base class:

class CustomStorage(MemoryStorage):
    async def store(self, memory: Memory) -> Tuple[bool, str]:
        # Custom implementation

Custom Embedding Models

Replace the default sentence transformer:

EMBEDDING_MODEL = "your-model/name"

Protocol Extensions

Add new operations via tool registration:

types.Tool(
    name="custom_operation",
    description="Custom memory operation",
    inputSchema={
        "type": "object",
        "properties": {
            "param1": {
                "type": "string",
                "description": "First parameter"
            },
            "param2": {
                "type": "integer",
                "description": "Second parameter",
                "default": 0
            }
        },
        "required": ["param1"],
        "additionalProperties": false
    }
)

Future Enhancements

Planned Features (See Issue #91)

  • WFGY Semantic Firewall - Enhanced memory reliability with 16 failure mode detection/recovery
  • Ontology Foundation Layer (Phase 0) - Controlled vocabulary, taxonomy, knowledge graph
  • Automatic memory consolidation
  • Semantic clustering
  • Memory importance scoring
  • Cross-conversation threading

Under Consideration

  • Agentic RAG for intelligent retrieval (see Discussion #86)
  • Graph-based memory relationships (ontology pipeline integration)
  • Memory compression strategies
  • Federated learning from memories
  • Real-time collaboration features
  • Advanced visualization tools

References