Skip to content

feat(timmy): dual-index RAG with external embedding ingestion, query decomposition, and cross-encoder reranking #241

@ericfitz

Description

@ericfitz

Summary

Evolve Timmy from a single-embedding-model, single-vector-index RAG system to a dual-index architecture with external embedding ingestion, LLM-driven query decomposition, and cross-encoder reranking.

Motivation

External tools (e.g., webhook-enabled source code scanners) should be able to add embeddings for source code related to a threat model, so that Timmy can answer questions about source code as well as textual threat model elements. This offloads CPU/memory-intensive embedding work from the TMI process and enables specialized code embedding models.

Design

Two Indexes

Index Embedding Model Embedded By Entity Types
Text Text embedding model TMI (internal: assets, threats, diagrams, notes) + external (documents) asset, threat, diagram, note, document
Code Code embedding model External only (repositories) repository

Key Features

  1. Dual vector indexes (text + code) with separate embedding models per index
  2. External embedding automation — documents and repositories embedded by external tools, offloading CPU/memory from TMI
  3. Config-sharing APIGET /threat_models/{id}/embeddings/config shares embedding model config (including API keys) with authenticated automation
  4. Embedding ingestion APIPOST /threat_models/{id}/embeddings accepts pre-computed embeddings with index_type field
  5. embedding-automation built-in group — new authorization group (UUID 00000000-0000-0000-0000-000000000005) for embedding automation service accounts
  6. LLM-driven query decomposition — inference model breaks user queries into sub-queries per index, choosing parallel or sequential execution strategy
  7. Cross-encoder reranking — merged results from both indexes rescored by a cross-encoder model before synthesis
  8. Backward compatible — existing single-model deployments continue working; code index and reranker are optional

New Configuration

# Text embedding (backward compatible with existing TMI_TIMMY_EMBEDDING_* vars)
TMI_TIMMY_TEXT_EMBEDDING_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_TEXT_RETRIEVAL_TOP_K

# Code embedding (TMI uses for query-time; external tools use for content)
TMI_TIMMY_CODE_EMBEDDING_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_CODE_RETRIEVAL_TOP_K

# Cross-encoder reranking
TMI_TIMMY_RERANK_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_RERANK_TOP_K

Automation Workflow

  1. Admin creates automation account, adds to embedding-automation group, grants client credentials
  2. Automation authenticates via client credentials grant
  3. Automation calls GET /threat_models/{id}/embeddings/config to discover embedding model config
  4. Automation subscribes to repository.created/document.created webhook events
  5. On event: clone repo / fetch document, perform semantic analysis, chunk, embed using configured model
  6. Push embeddings via POST /threat_models/{id}/embeddings
  7. TMI invalidates in-memory index, next Timmy query picks up new embeddings

Query Flow

User question
  → Query Decomposition (inference LLM splits into text_query + code_query, picks strategy)
  → Embed sub-queries (text model for text index, code model for code index)
  → Search both indexes (parallel or sequential per strategy)
  → Merge candidates
  → Cross-encoder reranking (if configured)
  → Build context with source attribution
  → Inference LLM synthesizes answer
  → Stream response to user

Implementation Phases

  1. Configuration & Data Layer — extend config for dual models + reranker, add ListByThreatModelAndIndexType to embedding store
  2. Dual Vector Index Manager — composite key (threatModelID, indexType), InvalidateIndex method, shared memory budget
  3. Authorization & Embedding APIs — new embedding-automation group, config-sharing endpoint, ingestion endpoint, bulk delete
  4. Cross-Encoder RerankingReranker interface, API-based cross-encoder client, cosine-similarity fallback
  5. Query DecompositionQueryDecomposer with structured LLM prompt, parallel/sequential strategy support
  6. Integrated HandleMessage Flow — wire decomposition → dual search → rerank → context → LLM
  7. OpenAPI Spec & Tests — new schemas/paths, unit + integration tests

Each phase is independently deployable. Phases 1-2 are internal refactors with no behavior change.

Plan File

Full implementation plan: .claude/plans/warm-munching-avalanche.md

Critical Files

File Change
internal/config/timmy.go Dual model + reranker config, backward compat
api/timmy_index_types.go NEW: index type constants
api/timmy_embedding_store.go New interface method
api/timmy_embedding_store_gorm.go Implementation
api/timmy_vector_manager.go Composite key, InvalidateIndex
api/timmy_session_manager.go Skip docs/repos, new query flow
api/validation/validators.go New group UUID
api/auth_utils.go New group constants
api/group_membership.go New BuiltInGroup var
api/seed/seed.go Seed new group
api/timmy_embedding_handlers.go NEW: config + ingestion + delete handlers
api/timmy_reranker.go NEW: cross-encoder reranker
api/timmy_query_decomposer.go NEW: LLM query decomposition
api/timmy_llm_service.go Dual embedder support
api/timmy_context_builder.go Ranked results formatting
api-schema/tmi-openapi.json New schemas + endpoints

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Status

In Progress

Relationships

None yet

Development

No branches or pull requests

Issue actions