-
Notifications
You must be signed in to change notification settings - Fork 0
feat(vector): reranker optimization — metadata-aware scoring, cross-encoder, temporal search #35
Description
Summary
Our vector search returns semantically similar chunks but ranks them purely by embedding cosine similarity. Now that we're collecting rich structured metadata across GitHub, YouTube, Reddit, and sessions, we can build a significantly smarter reranker that boosts results based on recency, authority signals (stars, upvotes, comment counts), content type, and source trustworthiness. Also investigate a proper cross-encoder reranker for precision-critical queries.
Current State
axon query / axon ask pipeline today:
- TEI embedding of query → cosine similarity search in Qdrant
- Top-K results returned as-is (ranked by vector similarity only)
- No reranking, no metadata weighting, no temporal decay
We now have the metadata to do better — it's just not being used in ranking.
Part 1: Metadata-Aware Reranker
Available Signals by Source
GitHub (gh_* payload fields):
gh_stars,gh_forks— repo authority/popularitygh_state(open/closed) — relevance for issues (open = active problem)gh_is_pr— distinguish issue vs PR resultsgh_labels— bug/enhancement/docs classificationgh_pushed_at,gh_updated_at— recency of activitygh_comment_count— engagement signal (more comments = more important issue)gh_is_archived,gh_is_fork— deprioritize archived/forked content
YouTube (yt_* payload fields — audit what exists):
- View count, like count, upload date — popularity + recency
Reddit (reddit_* payload fields — audit what exists):
- Score (upvotes - downvotes), comment count, subreddit — community signal
Sessions (once #33 lands):
session_date— temporal relevancesession_project— boost results from the current active project
Reranking Formula (starting point)
final_score = vector_similarity
× recency_decay(updated_at, half_life=180days)
× authority_boost(stars, forks, upvotes) // log-scaled
× state_boost(is_open=1.2, is_closed=0.8) // for issues
× type_weight(content_type) // configurable per query
This is a linear combination to start — can evolve to a learned ranker.
Implementation
- New
crates/vector/ops/rerank/module:metadata_rerank(chunks: Vec<ScoredChunk>, query_ctx: &RerankContext) -> Vec<ScoredChunk>recency_decay(updated_at: Option<DateTime>, half_life_days: f64) -> f64authority_score(stars: u64, forks: u64, comments: u64) -> f64
RerankContextcarries: current date, active project, query intent (issue/code/docs/general)- Applied after Qdrant retrieval, before LLM context assembly in
askpipeline --rerank falseflag to disable for debugging / latency comparison
Part 2: Cross-Encoder Reranker
What It Is
A cross-encoder takes the full (query, document) pair as input and produces a relevance score — unlike bi-encoders (TEI) which encode query and document independently. Cross-encoders are slower but significantly more accurate for reranking a small candidate set (top-20 from vector search).
Investigation
- Evaluate self-hosted cross-encoder options compatible with TEI or a sidecar:
- TEI cross-encoder support: TEI supports
re-rankendpoints — investigate if our TEI instance supports it - Candidate models:
cross-encoder/ms-marco-MiniLM-L-6-v2,BAAI/bge-reranker-v2-m3 - Jina Reranker:
jinaai/jina-reranker-v2-base-multilingual
- TEI cross-encoder support: TEI supports
- Pipeline: Qdrant top-50 (vector) → cross-encoder rerank top-50 → return top-10 to LLM
- Latency budget: cross-encoder rerank of 50 docs should be <200ms on local hardware
TEI Re-rank Endpoint
If our TEI instance supports it:
POST http://tei-host:52000/rerank
{
"query": "how to handle async errors in rust",
"texts": ["doc1 content", "doc2 content", ...],
"truncate": true
}Add tei_rerank() to crates/vector/ops/tei.rs alongside tei_embed().
Part 3: Temporal Search
--since / --until Filters
axon query "memory leak fix" --since 30d
axon query "breaking change" --since 2025-01-01 --until 2025-06-01
axon ask "what changed in v2?" --since 90d --filter gh_is_pr=trueTranslates to a Qdrant payload filter on updated_at, gh_pushed_at, session_date, gh_created_at etc.
Temporal Decay in Rankings
Even without explicit --since, recent content should naturally score higher for queries that imply recency ("latest", "current", "now", "v2", "2025"):
- Detect recency intent in query (simple keyword heuristic to start)
- Apply stronger recency decay weight when intent detected
- Configurable half-life per source type (code changes faster than docs)
Files
| File | Action |
|---|---|
crates/vector/ops/rerank/ |
New module: metadata reranker, recency decay, authority scoring |
crates/vector/ops/tei.rs |
Add tei_rerank() if TEI supports cross-encoder endpoint |
crates/vector/ops/commands/query.rs |
Wire reranker; add --since, --until, --rerank flags |
crates/vector/ops/commands/ask.rs |
Wire reranker into RAG context assembly |
crates/core/config/types/config.rs |
Add reranker config fields (enable, model, half-life) |
docs/PERFORMANCE.md |
Document reranker latency benchmarks |
Acceptance Criteria
- Metadata reranker applied after Qdrant retrieval in
queryandaskpipelines - GitHub results boosted by stars/forks/comment count (log-scaled)
- Open issues ranked above closed for issue-related queries
- Recency decay applied — recently updated content scores higher
-
--rerank falsedisables reranker for debugging/latency comparison - TEI cross-encoder endpoint investigated and documented (supported or not)
- If supported:
tei_rerank()implemented and wired as opt-in (--rerank cross-encoder) -
--since <duration|date>and--until <date>flags onaxon queryandaxon ask - Temporal filters translate to Qdrant payload filters on date fields
- Benchmark: reranker adds <50ms latency on typical top-20 candidate set
-
cargo clippyclean, all tests pass