Summary
Evolve Timmy from a single-embedding-model, single-vector-index RAG system to a dual-index architecture with external embedding ingestion, LLM-driven query decomposition, and cross-encoder reranking.
Motivation
External tools (e.g., webhook-enabled source code scanners) should be able to add embeddings for source code related to a threat model, so that Timmy can answer questions about source code as well as textual threat model elements. This offloads CPU/memory-intensive embedding work from the TMI process and enables specialized code embedding models.
Design
Two Indexes
| Index |
Embedding Model |
Embedded By |
Entity Types |
| Text |
Text embedding model |
TMI (internal: assets, threats, diagrams, notes) + external (documents) |
asset, threat, diagram, note, document |
| Code |
Code embedding model |
External only (repositories) |
repository |
Key Features
- Dual vector indexes (text + code) with separate embedding models per index
- External embedding automation — documents and repositories embedded by external tools, offloading CPU/memory from TMI
- Config-sharing API —
GET /threat_models/{id}/embeddings/config shares embedding model config (including API keys) with authenticated automation
- Embedding ingestion API —
POST /threat_models/{id}/embeddings accepts pre-computed embeddings with index_type field
embedding-automation built-in group — new authorization group (UUID 00000000-0000-0000-0000-000000000005) for embedding automation service accounts
- LLM-driven query decomposition — inference model breaks user queries into sub-queries per index, choosing parallel or sequential execution strategy
- Cross-encoder reranking — merged results from both indexes rescored by a cross-encoder model before synthesis
- Backward compatible — existing single-model deployments continue working; code index and reranker are optional
New Configuration
# Text embedding (backward compatible with existing TMI_TIMMY_EMBEDDING_* vars)
TMI_TIMMY_TEXT_EMBEDDING_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_TEXT_RETRIEVAL_TOP_K
# Code embedding (TMI uses for query-time; external tools use for content)
TMI_TIMMY_CODE_EMBEDDING_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_CODE_RETRIEVAL_TOP_K
# Cross-encoder reranking
TMI_TIMMY_RERANK_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_RERANK_TOP_K
Automation Workflow
- Admin creates automation account, adds to
embedding-automation group, grants client credentials
- Automation authenticates via client credentials grant
- Automation calls
GET /threat_models/{id}/embeddings/config to discover embedding model config
- Automation subscribes to
repository.created/document.created webhook events
- On event: clone repo / fetch document, perform semantic analysis, chunk, embed using configured model
- Push embeddings via
POST /threat_models/{id}/embeddings
- TMI invalidates in-memory index, next Timmy query picks up new embeddings
Query Flow
User question
→ Query Decomposition (inference LLM splits into text_query + code_query, picks strategy)
→ Embed sub-queries (text model for text index, code model for code index)
→ Search both indexes (parallel or sequential per strategy)
→ Merge candidates
→ Cross-encoder reranking (if configured)
→ Build context with source attribution
→ Inference LLM synthesizes answer
→ Stream response to user
Implementation Phases
- Configuration & Data Layer — extend config for dual models + reranker, add
ListByThreatModelAndIndexType to embedding store
- Dual Vector Index Manager — composite key
(threatModelID, indexType), InvalidateIndex method, shared memory budget
- Authorization & Embedding APIs — new
embedding-automation group, config-sharing endpoint, ingestion endpoint, bulk delete
- Cross-Encoder Reranking —
Reranker interface, API-based cross-encoder client, cosine-similarity fallback
- Query Decomposition —
QueryDecomposer with structured LLM prompt, parallel/sequential strategy support
- Integrated HandleMessage Flow — wire decomposition → dual search → rerank → context → LLM
- OpenAPI Spec & Tests — new schemas/paths, unit + integration tests
Each phase is independently deployable. Phases 1-2 are internal refactors with no behavior change.
Plan File
Full implementation plan: .claude/plans/warm-munching-avalanche.md
Critical Files
| File |
Change |
internal/config/timmy.go |
Dual model + reranker config, backward compat |
api/timmy_index_types.go |
NEW: index type constants |
api/timmy_embedding_store.go |
New interface method |
api/timmy_embedding_store_gorm.go |
Implementation |
api/timmy_vector_manager.go |
Composite key, InvalidateIndex |
api/timmy_session_manager.go |
Skip docs/repos, new query flow |
api/validation/validators.go |
New group UUID |
api/auth_utils.go |
New group constants |
api/group_membership.go |
New BuiltInGroup var |
api/seed/seed.go |
Seed new group |
api/timmy_embedding_handlers.go |
NEW: config + ingestion + delete handlers |
api/timmy_reranker.go |
NEW: cross-encoder reranker |
api/timmy_query_decomposer.go |
NEW: LLM query decomposition |
api/timmy_llm_service.go |
Dual embedder support |
api/timmy_context_builder.go |
Ranked results formatting |
api-schema/tmi-openapi.json |
New schemas + endpoints |
Summary
Evolve Timmy from a single-embedding-model, single-vector-index RAG system to a dual-index architecture with external embedding ingestion, LLM-driven query decomposition, and cross-encoder reranking.
Motivation
External tools (e.g., webhook-enabled source code scanners) should be able to add embeddings for source code related to a threat model, so that Timmy can answer questions about source code as well as textual threat model elements. This offloads CPU/memory-intensive embedding work from the TMI process and enables specialized code embedding models.
Design
Two Indexes
Key Features
GET /threat_models/{id}/embeddings/configshares embedding model config (including API keys) with authenticated automationPOST /threat_models/{id}/embeddingsaccepts pre-computed embeddings withindex_typefieldembedding-automationbuilt-in group — new authorization group (UUID00000000-0000-0000-0000-000000000005) for embedding automation service accountsNew Configuration
Automation Workflow
embedding-automationgroup, grants client credentialsGET /threat_models/{id}/embeddings/configto discover embedding model configrepository.created/document.createdwebhook eventsPOST /threat_models/{id}/embeddingsQuery Flow
Implementation Phases
ListByThreatModelAndIndexTypeto embedding store(threatModelID, indexType),InvalidateIndexmethod, shared memory budgetembedding-automationgroup, config-sharing endpoint, ingestion endpoint, bulk deleteRerankerinterface, API-based cross-encoder client, cosine-similarity fallbackQueryDecomposerwith structured LLM prompt, parallel/sequential strategy supportEach phase is independently deployable. Phases 1-2 are internal refactors with no behavior change.
Plan File
Full implementation plan:
.claude/plans/warm-munching-avalanche.mdCritical Files
internal/config/timmy.goapi/timmy_index_types.goapi/timmy_embedding_store.goapi/timmy_embedding_store_gorm.goapi/timmy_vector_manager.goapi/timmy_session_manager.goapi/validation/validators.goapi/auth_utils.goapi/group_membership.goapi/seed/seed.goapi/timmy_embedding_handlers.goapi/timmy_reranker.goapi/timmy_query_decomposer.goapi/timmy_llm_service.goapi/timmy_context_builder.goapi-schema/tmi-openapi.json