feat(timmy): dual-index RAG with external embedding ingestion, query decomposition, and cross-encoder reranking

## Summary

Evolve Timmy from a single-embedding-model, single-vector-index RAG system to a dual-index architecture with external embedding ingestion, LLM-driven query decomposition, and cross-encoder reranking.

## Motivation

External tools (e.g., webhook-enabled source code scanners) should be able to add embeddings for source code related to a threat model, so that Timmy can answer questions about source code as well as textual threat model elements. This offloads CPU/memory-intensive embedding work from the TMI process and enables specialized code embedding models.

## Design

### Two Indexes

| Index | Embedding Model | Embedded By | Entity Types |
|-------|----------------|-------------|--------------|
| **Text** | Text embedding model | TMI (internal: assets, threats, diagrams, notes) + external (documents) | asset, threat, diagram, note, document |
| **Code** | Code embedding model | External only (repositories) | repository |

### Key Features

1. **Dual vector indexes** (text + code) with separate embedding models per index
2. **External embedding automation** — documents and repositories embedded by external tools, offloading CPU/memory from TMI
3. **Config-sharing API** — `GET /threat_models/{id}/embeddings/config` shares embedding model config (including API keys) with authenticated automation
4. **Embedding ingestion API** — `POST /threat_models/{id}/embeddings` accepts pre-computed embeddings with `index_type` field
5. **`embedding-automation` built-in group** — new authorization group (UUID `00000000-0000-0000-0000-000000000005`) for embedding automation service accounts
6. **LLM-driven query decomposition** — inference model breaks user queries into sub-queries per index, choosing parallel or sequential execution strategy
7. **Cross-encoder reranking** — merged results from both indexes rescored by a cross-encoder model before synthesis
8. **Backward compatible** — existing single-model deployments continue working; code index and reranker are optional

### New Configuration

```
# Text embedding (backward compatible with existing TMI_TIMMY_EMBEDDING_* vars)
TMI_TIMMY_TEXT_EMBEDDING_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_TEXT_RETRIEVAL_TOP_K

# Code embedding (TMI uses for query-time; external tools use for content)
TMI_TIMMY_CODE_EMBEDDING_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_CODE_RETRIEVAL_TOP_K

# Cross-encoder reranking
TMI_TIMMY_RERANK_PROVIDER / MODEL / API_KEY / BASE_URL
TMI_TIMMY_RERANK_TOP_K
```

### Automation Workflow

1. Admin creates automation account, adds to `embedding-automation` group, grants client credentials
2. Automation authenticates via client credentials grant
3. Automation calls `GET /threat_models/{id}/embeddings/config` to discover embedding model config
4. Automation subscribes to `repository.created`/`document.created` webhook events
5. On event: clone repo / fetch document, perform semantic analysis, chunk, embed using configured model
6. Push embeddings via `POST /threat_models/{id}/embeddings`
7. TMI invalidates in-memory index, next Timmy query picks up new embeddings

### Query Flow

```
User question
  → Query Decomposition (inference LLM splits into text_query + code_query, picks strategy)
  → Embed sub-queries (text model for text index, code model for code index)
  → Search both indexes (parallel or sequential per strategy)
  → Merge candidates
  → Cross-encoder reranking (if configured)
  → Build context with source attribution
  → Inference LLM synthesizes answer
  → Stream response to user
```

## Implementation Phases

1. **Configuration & Data Layer** — extend config for dual models + reranker, add `ListByThreatModelAndIndexType` to embedding store
2. **Dual Vector Index Manager** — composite key `(threatModelID, indexType)`, `InvalidateIndex` method, shared memory budget
3. **Authorization & Embedding APIs** — new `embedding-automation` group, config-sharing endpoint, ingestion endpoint, bulk delete
4. **Cross-Encoder Reranking** — `Reranker` interface, API-based cross-encoder client, cosine-similarity fallback
5. **Query Decomposition** — `QueryDecomposer` with structured LLM prompt, parallel/sequential strategy support
6. **Integrated HandleMessage Flow** — wire decomposition → dual search → rerank → context → LLM
7. **OpenAPI Spec & Tests** — new schemas/paths, unit + integration tests

Each phase is independently deployable. Phases 1-2 are internal refactors with no behavior change.

## Plan File

Full implementation plan: `.claude/plans/warm-munching-avalanche.md`

## Critical Files

| File | Change |
|------|--------|
| `internal/config/timmy.go` | Dual model + reranker config, backward compat |
| `api/timmy_index_types.go` | NEW: index type constants |
| `api/timmy_embedding_store.go` | New interface method |
| `api/timmy_embedding_store_gorm.go` | Implementation |
| `api/timmy_vector_manager.go` | Composite key, InvalidateIndex |
| `api/timmy_session_manager.go` | Skip docs/repos, new query flow |
| `api/validation/validators.go` | New group UUID |
| `api/auth_utils.go` | New group constants |
| `api/group_membership.go` | New BuiltInGroup var |
| `api/seed/seed.go` | Seed new group |
| `api/timmy_embedding_handlers.go` | NEW: config + ingestion + delete handlers |
| `api/timmy_reranker.go` | NEW: cross-encoder reranker |
| `api/timmy_query_decomposer.go` | NEW: LLM query decomposition |
| `api/timmy_llm_service.go` | Dual embedder support |
| `api/timmy_context_builder.go` | Ranked results formatting |
| `api-schema/tmi-openapi.json` | New schemas + endpoints |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(timmy): dual-index RAG with external embedding ingestion, query decomposition, and cross-encoder reranking #241

Summary

Motivation

Design

Two Indexes

Key Features

New Configuration

Automation Workflow

Query Flow

Implementation Phases

Plan File

Critical Files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Index	Embedding Model	Embedded By	Entity Types
Text	Text embedding model	TMI (internal: assets, threats, diagrams, notes) + external (documents)	asset, threat, diagram, note, document
Code	Code embedding model	External only (repositories)	repository

File	Change
`internal/config/timmy.go`	Dual model + reranker config, backward compat
`api/timmy_index_types.go`	NEW: index type constants
`api/timmy_embedding_store.go`	New interface method
`api/timmy_embedding_store_gorm.go`	Implementation
`api/timmy_vector_manager.go`	Composite key, InvalidateIndex
`api/timmy_session_manager.go`	Skip docs/repos, new query flow
`api/validation/validators.go`	New group UUID
`api/auth_utils.go`	New group constants
`api/group_membership.go`	New BuiltInGroup var
`api/seed/seed.go`	Seed new group
`api/timmy_embedding_handlers.go`	NEW: config + ingestion + delete handlers
`api/timmy_reranker.go`	NEW: cross-encoder reranker
`api/timmy_query_decomposer.go`	NEW: LLM query decomposition
`api/timmy_llm_service.go`	Dual embedder support
`api/timmy_context_builder.go`	Ranked results formatting
`api-schema/tmi-openapi.json`	New schemas + endpoints

feat(timmy): dual-index RAG with external embedding ingestion, query decomposition, and cross-encoder reranking #241

Description

Summary

Motivation

Design

Two Indexes

Key Features

New Configuration

Automation Workflow

Query Flow

Implementation Phases

Plan File

Critical Files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions