feat(vector): reranker optimization — metadata-aware scoring, cross-encoder, temporal search

## Summary

Our vector search returns semantically similar chunks but ranks them purely by embedding cosine similarity. Now that we're collecting rich structured metadata across GitHub, YouTube, Reddit, and sessions, we can build a significantly smarter reranker that boosts results based on recency, authority signals (stars, upvotes, comment counts), content type, and source trustworthiness. Also investigate a proper cross-encoder reranker for precision-critical queries.

## Current State

`axon query` / `axon ask` pipeline today:
1. TEI embedding of query → cosine similarity search in Qdrant
2. Top-K results returned as-is (ranked by vector similarity only)
3. No reranking, no metadata weighting, no temporal decay

We now have the metadata to do better — it's just not being used in ranking.

## Part 1: Metadata-Aware Reranker

### Available Signals by Source

**GitHub** (`gh_*` payload fields):
- `gh_stars`, `gh_forks` — repo authority/popularity
- `gh_state` (open/closed) — relevance for issues (open = active problem)
- `gh_is_pr` — distinguish issue vs PR results
- `gh_labels` — bug/enhancement/docs classification
- `gh_pushed_at`, `gh_updated_at` — recency of activity
- `gh_comment_count` — engagement signal (more comments = more important issue)
- `gh_is_archived`, `gh_is_fork` — deprioritize archived/forked content

**YouTube** (`yt_*` payload fields — audit what exists):
- View count, like count, upload date — popularity + recency

**Reddit** (`reddit_*` payload fields — audit what exists):
- Score (upvotes - downvotes), comment count, subreddit — community signal

**Sessions** (once #33 lands):
- `session_date` — temporal relevance
- `session_project` — boost results from the current active project

### Reranking Formula (starting point)

```
final_score = vector_similarity
    × recency_decay(updated_at, half_life=180days)
    × authority_boost(stars, forks, upvotes)   // log-scaled
    × state_boost(is_open=1.2, is_closed=0.8)  // for issues
    × type_weight(content_type)                // configurable per query
```

This is a linear combination to start — can evolve to a learned ranker.

### Implementation

- New `crates/vector/ops/rerank/` module:
  - `metadata_rerank(chunks: Vec<ScoredChunk>, query_ctx: &RerankContext) -> Vec<ScoredChunk>`
  - `recency_decay(updated_at: Option<DateTime>, half_life_days: f64) -> f64`
  - `authority_score(stars: u64, forks: u64, comments: u64) -> f64`
- `RerankContext` carries: current date, active project, query intent (issue/code/docs/general)
- Applied after Qdrant retrieval, before LLM context assembly in `ask` pipeline
- `--rerank false` flag to disable for debugging / latency comparison

## Part 2: Cross-Encoder Reranker

### What It Is

A cross-encoder takes the full (query, document) pair as input and produces a relevance score — unlike bi-encoders (TEI) which encode query and document independently. Cross-encoders are slower but significantly more accurate for reranking a small candidate set (top-20 from vector search).

### Investigation

- Evaluate self-hosted cross-encoder options compatible with TEI or a sidecar:
  - **TEI cross-encoder support**: TEI supports `re-rank` endpoints — investigate if our TEI instance supports it
  - **Candidate models**: `cross-encoder/ms-marco-MiniLM-L-6-v2`, `BAAI/bge-reranker-v2-m3`
  - **Jina Reranker**: `jinaai/jina-reranker-v2-base-multilingual`
- Pipeline: Qdrant top-50 (vector) → cross-encoder rerank top-50 → return top-10 to LLM
- Latency budget: cross-encoder rerank of 50 docs should be <200ms on local hardware

### TEI Re-rank Endpoint

If our TEI instance supports it:
```bash
POST http://tei-host:52000/rerank
{
  "query": "how to handle async errors in rust",
  "texts": ["doc1 content", "doc2 content", ...],
  "truncate": true
}
```

Add `tei_rerank()` to `crates/vector/ops/tei.rs` alongside `tei_embed()`.

## Part 3: Temporal Search

### `--since` / `--until` Filters

```bash
axon query "memory leak fix" --since 30d
axon query "breaking change" --since 2025-01-01 --until 2025-06-01
axon ask "what changed in v2?" --since 90d --filter gh_is_pr=true
```

Translates to a Qdrant payload filter on `updated_at`, `gh_pushed_at`, `session_date`, `gh_created_at` etc.

### Temporal Decay in Rankings

Even without explicit `--since`, recent content should naturally score higher for queries that imply recency ("latest", "current", "now", "v2", "2025"):
- Detect recency intent in query (simple keyword heuristic to start)
- Apply stronger recency decay weight when intent detected
- Configurable half-life per source type (code changes faster than docs)

## Files

| File | Action |
|------|--------|
| `crates/vector/ops/rerank/` | New module: metadata reranker, recency decay, authority scoring |
| `crates/vector/ops/tei.rs` | Add `tei_rerank()` if TEI supports cross-encoder endpoint |
| `crates/vector/ops/commands/query.rs` | Wire reranker; add `--since`, `--until`, `--rerank` flags |
| `crates/vector/ops/commands/ask.rs` | Wire reranker into RAG context assembly |
| `crates/core/config/types/config.rs` | Add reranker config fields (enable, model, half-life) |
| `docs/PERFORMANCE.md` | Document reranker latency benchmarks |

## Acceptance Criteria

- [ ] Metadata reranker applied after Qdrant retrieval in `query` and `ask` pipelines
- [ ] GitHub results boosted by stars/forks/comment count (log-scaled)
- [ ] Open issues ranked above closed for issue-related queries
- [ ] Recency decay applied — recently updated content scores higher
- [ ] `--rerank false` disables reranker for debugging/latency comparison
- [ ] TEI cross-encoder endpoint investigated and documented (supported or not)
- [ ] If supported: `tei_rerank()` implemented and wired as opt-in (`--rerank cross-encoder`)
- [ ] `--since <duration|date>` and `--until <date>` flags on `axon query` and `axon ask`
- [ ] Temporal filters translate to Qdrant payload filters on date fields
- [ ] Benchmark: reranker adds <50ms latency on typical top-20 candidate set
- [ ] `cargo clippy` clean, all tests pass

File	Action
`crates/vector/ops/rerank/`	New module: metadata reranker, recency decay, authority scoring
`crates/vector/ops/tei.rs`	Add `tei_rerank()` if TEI supports cross-encoder endpoint
`crates/vector/ops/commands/query.rs`	Wire reranker; add `--since`, `--until`, `--rerank` flags
`crates/vector/ops/commands/ask.rs`	Wire reranker into RAG context assembly
`crates/core/config/types/config.rs`	Add reranker config fields (enable, model, half-life)
`docs/PERFORMANCE.md`	Document reranker latency benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vector): reranker optimization — metadata-aware scoring, cross-encoder, temporal search #35

Summary

Current State

Part 1: Metadata-Aware Reranker

Available Signals by Source

Reranking Formula (starting point)

Implementation

Part 2: Cross-Encoder Reranker

What It Is

Investigation

TEI Re-rank Endpoint

Part 3: Temporal Search

`--since` / `--until` Filters

Temporal Decay in Rankings

Files

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(vector): reranker optimization — metadata-aware scoring, cross-encoder, temporal search #35

Description

Summary

Current State

Part 1: Metadata-Aware Reranker

Available Signals by Source

Reranking Formula (starting point)

Implementation

Part 2: Cross-Encoder Reranker

What It Is

Investigation

TEI Re-rank Endpoint

Part 3: Temporal Search

--since / --until Filters

Temporal Decay in Rankings

Files

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`--since` / `--until` Filters