Skip to content

Refactor LTM to use SQLite with graph memory and vector capabilities #95

@prefrontalsys

Description

@prefrontalsys

Current State

Short-term memory (STM):

  • Supports both JSONL (default) and SQLite backends (v0.7.5+)
  • Configurable via CORTEXGRAPH_STORAGE_BACKEND=sqlite

Long-term memory (LTM):

  • Markdown files in Obsidian vault (human-readable, git-friendly)
  • Indexed via lightweight JSONL index (ltm_index.py)
  • No graph or vector search capabilities

The Readability Tradeoff

Moving entirely to SQLite would sacrifice human inspectability - a core feature of cortexgraph. Being able to cat memories.jsonl or open markdown files in Obsidian makes the system trustworthy and debuggable.

Proposed Architecture: Hybrid Approach

Keep JSONL/Markdown as source of truth, enhance SQLite as a derived index layer.

┌─────────────────────────────────────┐
│   Source of Truth                   │
│   - JSONL for STM (human-readable) │
│   - Markdown for LTM (Obsidian)    │
│   - Git version control             │
└──────────────┬──────────────────────┘
               │
               │ Indexed by
               ↓
┌─────────────────────────────────────┐
│   SQLite Index (Derived)            │
│   - Graph queries (CTEs)            │
│   - Vector search (sqlite-vec)      │
│   - FTS5 full-text search           │
│   - Rebuilable from source          │
└─────────────────────────────────────┘

This is an enhancement of the existing ltm_index.py pattern, not a replacement.

Implementation Plan

1. Enhanced SQLite Index Schema

-- Nodes (memories)
CREATE TABLE memories (
    id TEXT PRIMARY KEY,
    content TEXT,
    embedding BLOB,  -- sqlite-vec extension
    created_at INTEGER,
    last_used INTEGER,
    score REAL,
    source_file TEXT  -- Path to JSONL/markdown source
);

-- Full-text search
CREATE VIRTUAL TABLE memories_fts USING fts5(content);

-- Edges (relations)
CREATE TABLE relations (
    id TEXT PRIMARY KEY,
    from_id TEXT,
    to_id TEXT,
    type TEXT,
    strength REAL,
    FOREIGN KEY(from_id) REFERENCES memories(id),
    FOREIGN KEY(to_id) REFERENCES memories(id)
);

-- Vector index (sqlite-vec)
CREATE VIRTUAL TABLE vec_index USING vec0(
    embedding FLOAT[384]  -- all-MiniLM-L6-v2 dimension
);

2. Vector Search with sqlite-vec

  • Use sqlite-vec extension (by Alex Garcia, 2024)
  • Modern, well-maintained, designed for embedding search
  • Cosine similarity queries: vec_distance_cosine(embedding, query_vector)

3. Graph Queries with Recursive CTEs

SQLite already supports graph traversal via Common Table Expressions:

-- Find all related memories (2 hops)
WITH RECURSIVE related(id, depth) AS (
    VALUES('start-id', 0)
    UNION
    SELECT r.to_id, depth + 1 
    FROM relations r
    JOIN related ON r.from_id = related.id
    WHERE depth < 2
)
SELECT m.* FROM memories m JOIN related ON m.id = related.id;

4. Automatic Sync

  • Update index on save_memory, promote_memory
  • Background sync process for batch updates
  • Maintain source_file reference for rebuilding

5. Rebuild Command

cortexgraph-maintenance rebuild-index
# Scans JSONL/Markdown source files
# Rebuilds SQLite index from scratch
# Index is always derived, never canonical

Benefits of Hybrid Approach

Human readability preserved - JSONL/Markdown remains primary storage
Git-friendly - Text diffs, version control, backup
Obsidian integration - LTM stays in vault
Performance - Complex queries use SQLite index
Graph traversal - Recursive CTEs for relationships
Vector search - sqlite-vec for semantic similarity
Disaster recovery - Rebuild index from source anytime
Trustworthy - Inspect source files directly, no black box

What Changes

File organization:

~/.config/cortexgraph/
├── jsonl/                    # STM source (unchanged)
│   ├── memories.jsonl
│   └── relations.jsonl
├── index.db                  # Enhanced SQLite index
└── ltm_vault/                # LTM source (unchanged)
    └── *.md files

Query routing:

  • Simple lookups: Read JSONL/Markdown directly
  • Graph queries: Use SQLite CTEs
  • Vector search: Use sqlite-vec index
  • Full-text search: Use SQLite FTS5

Principle: Index is derived, source is canonical.

Dependencies

  • sqlite-vec - Vector similarity extension (https://github.com/asg017/sqlite-vec)
  • FTS5 - Built into SQLite for full-text search
  • Recursive CTEs - Built into SQLite for graph queries

Migration Path

  1. Enhance existing ltm_index.py with new schema
  2. Add sqlite-vec extension loading
  3. Populate index from existing JSONL/Markdown
  4. Add sync hooks to save/promote operations
  5. Expose graph/vector query tools
  6. Maintain backward compatibility

Related Work

  • Current LTM index: src/cortexgraph/storage/ltm_index.py
  • STM SQLite implementation: src/cortexgraph/storage/sqlite_storage.py
  • Graph features: docs/graph_features.md

Priority

Medium - Enhancement to existing working system, preserves core architectural principles

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststorageJSONL storage and indexing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions