Skip to content

feat(vector): reranker optimization — metadata-aware scoring, cross-encoder, temporal search #35

@jmagar

Description

@jmagar

Summary

Our vector search returns semantically similar chunks but ranks them purely by embedding cosine similarity. Now that we're collecting rich structured metadata across GitHub, YouTube, Reddit, and sessions, we can build a significantly smarter reranker that boosts results based on recency, authority signals (stars, upvotes, comment counts), content type, and source trustworthiness. Also investigate a proper cross-encoder reranker for precision-critical queries.

Current State

axon query / axon ask pipeline today:

  1. TEI embedding of query → cosine similarity search in Qdrant
  2. Top-K results returned as-is (ranked by vector similarity only)
  3. No reranking, no metadata weighting, no temporal decay

We now have the metadata to do better — it's just not being used in ranking.

Part 1: Metadata-Aware Reranker

Available Signals by Source

GitHub (gh_* payload fields):

  • gh_stars, gh_forks — repo authority/popularity
  • gh_state (open/closed) — relevance for issues (open = active problem)
  • gh_is_pr — distinguish issue vs PR results
  • gh_labels — bug/enhancement/docs classification
  • gh_pushed_at, gh_updated_at — recency of activity
  • gh_comment_count — engagement signal (more comments = more important issue)
  • gh_is_archived, gh_is_fork — deprioritize archived/forked content

YouTube (yt_* payload fields — audit what exists):

  • View count, like count, upload date — popularity + recency

Reddit (reddit_* payload fields — audit what exists):

  • Score (upvotes - downvotes), comment count, subreddit — community signal

Sessions (once #33 lands):

  • session_date — temporal relevance
  • session_project — boost results from the current active project

Reranking Formula (starting point)

final_score = vector_similarity
    × recency_decay(updated_at, half_life=180days)
    × authority_boost(stars, forks, upvotes)   // log-scaled
    × state_boost(is_open=1.2, is_closed=0.8)  // for issues
    × type_weight(content_type)                // configurable per query

This is a linear combination to start — can evolve to a learned ranker.

Implementation

  • New crates/vector/ops/rerank/ module:
    • metadata_rerank(chunks: Vec<ScoredChunk>, query_ctx: &RerankContext) -> Vec<ScoredChunk>
    • recency_decay(updated_at: Option<DateTime>, half_life_days: f64) -> f64
    • authority_score(stars: u64, forks: u64, comments: u64) -> f64
  • RerankContext carries: current date, active project, query intent (issue/code/docs/general)
  • Applied after Qdrant retrieval, before LLM context assembly in ask pipeline
  • --rerank false flag to disable for debugging / latency comparison

Part 2: Cross-Encoder Reranker

What It Is

A cross-encoder takes the full (query, document) pair as input and produces a relevance score — unlike bi-encoders (TEI) which encode query and document independently. Cross-encoders are slower but significantly more accurate for reranking a small candidate set (top-20 from vector search).

Investigation

  • Evaluate self-hosted cross-encoder options compatible with TEI or a sidecar:
    • TEI cross-encoder support: TEI supports re-rank endpoints — investigate if our TEI instance supports it
    • Candidate models: cross-encoder/ms-marco-MiniLM-L-6-v2, BAAI/bge-reranker-v2-m3
    • Jina Reranker: jinaai/jina-reranker-v2-base-multilingual
  • Pipeline: Qdrant top-50 (vector) → cross-encoder rerank top-50 → return top-10 to LLM
  • Latency budget: cross-encoder rerank of 50 docs should be <200ms on local hardware

TEI Re-rank Endpoint

If our TEI instance supports it:

POST http://tei-host:52000/rerank
{
  "query": "how to handle async errors in rust",
  "texts": ["doc1 content", "doc2 content", ...],
  "truncate": true
}

Add tei_rerank() to crates/vector/ops/tei.rs alongside tei_embed().

Part 3: Temporal Search

--since / --until Filters

axon query "memory leak fix" --since 30d
axon query "breaking change" --since 2025-01-01 --until 2025-06-01
axon ask "what changed in v2?" --since 90d --filter gh_is_pr=true

Translates to a Qdrant payload filter on updated_at, gh_pushed_at, session_date, gh_created_at etc.

Temporal Decay in Rankings

Even without explicit --since, recent content should naturally score higher for queries that imply recency ("latest", "current", "now", "v2", "2025"):

  • Detect recency intent in query (simple keyword heuristic to start)
  • Apply stronger recency decay weight when intent detected
  • Configurable half-life per source type (code changes faster than docs)

Files

File Action
crates/vector/ops/rerank/ New module: metadata reranker, recency decay, authority scoring
crates/vector/ops/tei.rs Add tei_rerank() if TEI supports cross-encoder endpoint
crates/vector/ops/commands/query.rs Wire reranker; add --since, --until, --rerank flags
crates/vector/ops/commands/ask.rs Wire reranker into RAG context assembly
crates/core/config/types/config.rs Add reranker config fields (enable, model, half-life)
docs/PERFORMANCE.md Document reranker latency benchmarks

Acceptance Criteria

  • Metadata reranker applied after Qdrant retrieval in query and ask pipelines
  • GitHub results boosted by stars/forks/comment count (log-scaled)
  • Open issues ranked above closed for issue-related queries
  • Recency decay applied — recently updated content scores higher
  • --rerank false disables reranker for debugging/latency comparison
  • TEI cross-encoder endpoint investigated and documented (supported or not)
  • If supported: tei_rerank() implemented and wired as opt-in (--rerank cross-encoder)
  • --since <duration|date> and --until <date> flags on axon query and axon ask
  • Temporal filters translate to Qdrant payload filters on date fields
  • Benchmark: reranker adds <50ms latency on typical top-20 candidate set
  • cargo clippy clean, all tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions