Skip to content

A full-stack RAG (Retrieval Augmented Generation) application with hybrid search, cross-encoder reranking, citation-verified AI answers, and LLM-as-Judge evaluation. Supports multiple AI providers including OpenAI, Anthropic, and Ollama for fully local operation.

Notifications You must be signed in to change notification settings

shrimpy8/semantic-search-next

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Search Next

A full-stack RAG (Retrieval Augmented Generation) application with hybrid search, cross-encoder reranking, citation-verified AI answers, and LLM-as-Judge evaluation. Supports multiple AI providers including OpenAI, Anthropic, and Ollama for fully local operation.

Pipeline Overview

When you search, here's what happens behind the scenes:

📄

1. Document
Upload & organize into collections

✂️

2. Chunk
Split into searchable pieces

🧮

3. Embed
Convert to vector embeddings

🔀

4. Search
Hybrid BM25 + Semantic

🏆

5. Rerank
Cross-encoder refinement

💬

6. Answer
RAG-powered response

📊

7. Eval
LLM-as-Judge quality

Pipeline Options

Step What It Does Provider Options Key Parameters
Chunk Splits documents into searchable pieces Built-in chunk_size (default: 1000), chunk_overlap (200)
Embed Converts text to vectors for semantic search Cloud: OpenAI, Voyage, Cohere, Jina
Local: Ollama
embedding_model
Search Finds relevant chunks using hybrid retrieval BM25 (keywords) + ChromaDB (semantic) alpha (0=keywords, 1=semantic), top_k, preset
Rerank AI scores each result for precise ranking Cloud: Cohere
Local: Jina
reranker_provider (auto/jina/cohere/none)
Answer Generates response from retrieved context Cloud: OpenAI, Anthropic
Local: Ollama
answer_provider, answer_model
Eval Measures retrieval & answer quality Cloud: OpenAI, Anthropic
Local: Ollama
eval_judge_provider, eval_judge_model

All settings configurable via the Settings page (/settings). For fully local operation, use Ollama + Jina — no API keys required.

How Search Works

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              SEARCH FLOW                                        │
└─────────────────────────────────────────────────────────────────────────────────┘

  ┌─────────────┐
  │  Your Query │
  │ "How does   │
  │  auth work?"│
  └──────┬──────┘
         │
         ▼
  ┌─────────────┐     ┌────────────────────────────────────────────────────────┐
  │   EMBED     │     │  Convert query to 3072-dimensional vector using AI     │
  │   QUERY     │────▶│  (OpenAI text-embedding-3-large)                       │
  └──────┬──────┘     └────────────────────────────────────────────────────────┘
         │
         ▼
  ┌─────────────────────────────────────────┐
  │         PARALLEL RETRIEVAL              │
  │  ┌─────────────┐    ┌─────────────┐     │
  │  │  SEMANTIC   │    │    BM25     │     │
  │  │   SEARCH    │    │  KEYWORDS   │     │
  │  │ (ChromaDB)  │    │ (In-memory) │     │
  │  │             │    │             │     │
  │  │ Finds by    │    │ Finds by    │     │
  │  │  meaning    │    │ exact terms │     │
  │  └──────┬──────┘    └──────┬──────┘     │
  │         │                  │            │
  └─────────┼──────────────────┼────────────┘
            │                  │
            ▼                  ▼
  ┌─────────────────────────────────────────┐
  │    RECIPROCAL RANK FUSION (RRF)         │
  │                                         │
  │  Intelligently merge both result sets   │
  │  α=0.5 → 50% semantic + 50% keywords    │
  └──────────────────┬──────────────────────┘
                     │
                     ▼
  ┌─────────────────────────────────────────┐
  │    CROSS-ENCODER RERANKING              │
  │                                         │
  │  AI model scores each query-document    │
  │  pair for precise relevance (0-100%)    │
  └──────────────────┬──────────────────────┘
                     │
                     ▼
  ┌─────────────────────────────────────────┐
  │    CONFIDENCE FILTERING                 │
  │                                         │
  │  ┌─────────────┐    ┌─────────────┐     │
  │  │    HIGH     │    │     LOW     │     │
  │  │ CONFIDENCE  │    │ CONFIDENCE  │     │
  │  │  (≥30%)     │    │  (<30%)     │     │
  │  │  Shown      │    │  Hidden     │     │
  │  └─────────────┘    └─────────────┘     │
  └──────────────────┬──────────────────────┘
                     │
                     ▼
  ┌─────────────────────────────────────────┐
  │    OPTIONAL: AI ANSWER + CITATIONS      │
  │                                         │
  │  RAG-powered answer with verification   │
  │  Each claim linked to source document   │
  └─────────────────────────────────────────┘

Want to learn more? The app includes an interactive "How It Works" page with detailed explanations of each concept, search quality progression, and settings guidance. See the screenshots below or explore it yourself when running the app.

Features

Feature Description
Hybrid Retrieval Combines BM25 keyword search with semantic embeddings using Reciprocal Rank Fusion (RRF)
AI Answer Generation RAG-powered answers with citation verification and hallucination detection
RAG Evaluations LLM-as-Judge evaluation with retrieval & answer quality metrics
AI Reranking Uses Jina cross-encoder (local) or Cohere API to rerank results for relevance
Confidence Filtering Separates high-confidence from low-confidence results based on configurable threshold
Answer Verification Extracts claims from AI answers and verifies them against source documents
Search Analytics Dashboard with search history, latency trends, and usage statistics
Document Preview View full document content with chunk navigation
Collection Scoping Search across all documents or within specific collections
Retrieval Presets High Precision / Balanced / High Recall modes
Score Transparency View semantic, BM25, rerank, and final scores on results
Multiple Providers Support for OpenAI, Anthropic, Ollama (local), Jina, Cohere, and Voyage AI
Dark Mode Full theme support with system preference detection

Screenshots

Semantic Search with AI Answer

Semantic Search Result

Detailed Relevance Scores

Search Result Scores

LLM-as-Judge Evaluation

Run evaluations to measure search quality with configurable judge models.

Run Evaluation Evaluation Output
Run Eval Eval Output

More Screenshots

Feature Description
Evaluation Details Detailed breakdown of evaluation metrics and scores
Collections Organize documents into searchable collections
Documents View and manage documents within collections
Analytics Track search history, latency trends, and query patterns
How It Works Interactive documentation explaining search technology
Settings Configure providers, models, and search parameters

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         FRONTEND                                │
│                    Next.js 15 (App Router)                      │
│              Shadcn/ui + Tailwind + TypeScript                  │
└─────────────────────────┬───────────────────────────────────────┘
                          │ HTTP/REST
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                         BACKEND                                 │
│                      FastAPI (Python)                           │
│  ┌─────────────┬─────────────┬─────────────┬─────────────────┐  │
│  │ Collections │  Documents  │   Search    │    Settings     │  │
│  │   API       │    API      │    API      │      API        │  │
│  └──────┬──────┴──────┬──────┴──────┬──────┴────────┬────────┘  │
│         │             │             │               │           │
│  ┌──────▼─────────────▼─────────────▼───────────────▼────────┐  │
│  │                    CORE SERVICES                          │  │
│  │  HybridSearchService │ Reranker │ VectorStore │ BM25Cache │  │
│  └──────┬───────────────┴──────────┴─────────────┬───────────┘  │
└─────────┼────────────────────────────────────────┼──────────────┘
          │                                        │
          ▼                                        ▼
┌─────────────────────┐                 ┌─────────────────────────┐
│     PostgreSQL      │                 │       ChromaDB          │
│  (Metadata + Config)│                 │    (Vector Store)       │
└─────────────────────┘                 └─────────────────────────┘

Search Flow

  1. Query Embedding - Generate embedding via OpenAI text-embedding-3-large
  2. Parallel Retrieval:
    • Semantic search via ChromaDB (cosine similarity)
    • BM25 keyword search (in-memory, per-collection cache with auto-invalidation)
  3. Reciprocal Rank Fusion (RRF) - Merge results with configurable alpha
  4. Reranking - Jina cross-encoder (local) or Cohere API
  5. Confidence Filtering - Split results by min_score_threshold (default: 35%)
  6. Response - High-confidence results + hidden low-confidence results

Tech Stack

Backend

  • FastAPI - Python web framework with async support
  • PostgreSQL - Relational database (metadata, settings, search history)
  • ChromaDB - Vector database for semantic search
  • OpenAI - Embeddings (text-embedding-3-large)
  • Jina/Cohere - Cross-encoder reranking
  • BM25 - Keyword search via rank_bm25

Frontend

  • Next.js 15 - React framework with App Router
  • TypeScript - Type safety
  • Tailwind CSS - Utility-first styling
  • Shadcn/ui - Component library
  • Lucide - Icons

Prerequisites

  • Node.js 18+
  • Python 3.11+
  • Docker & Docker Compose

Detailed Setup Guide: See INFRASTRUCTURE.md for comprehensive setup instructions including:

  • PostgreSQL & ChromaDB configuration
  • Local AI providers (Ollama, Jina reranker)
  • Cloud provider setup (OpenAI, Cohere, Voyage AI)
  • Troubleshooting guide

Local AI with Ollama (Optional)

Run the entire pipeline locally without API keys using Ollama:

Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows - Download from https://ollama.com/download

Pull Required Models

# Embedding model (choose one)
ollama pull nomic-embed-text      # Fast, good quality (recommended)
ollama pull mxbai-embed-large     # Higher quality, slower

# LLM for answers & evaluation (choose one)
ollama pull llama3.2              # Fast, 3B params (recommended)
ollama pull llama3.1:8b           # Better quality, 8B params
ollama pull mistral               # Good balance

Start Ollama Server

ollama serve  # Runs on http://localhost:11434

Configure in App

  1. Start the app and go to Settings (/settings)
  2. Configure Ollama models:
    • Embedding Model: ollama:nomic-embed-text
    • Answer Provider: ollama → Model: llama3.2
    • Eval Provider: ollama → Model: llama3.1:8b

Note: Ollama runs locally - no API keys required. First request may be slow as models load into memory.

Quick Start

1. Clone and setup environment

git clone https://github.com/shrimpy8/semantic-search-next.git
cd semantic-search-next

# Copy environment files
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env.local
# Edit with your API keys

2. Start Docker Services

# Start PostgreSQL + pgAdmin
docker-compose up -d

# Start ChromaDB (separate container)
docker run -d --name chromadb -p 8000:8000 chromadb/chroma

Services started:

  • PostgreSQL: localhost:5432
  • ChromaDB: localhost:8000
  • pgAdmin: http://localhost:3001 (login: admin@local.dev / admin)

3. Backend Setup

cd backend

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -e ".[dev]"

# Run FastAPI server
uvicorn app.main:app --reload --port 8080
  • API: http://localhost:8080
  • Swagger docs: http://localhost:8080/docs

4. Frontend Setup

cd frontend

# Install dependencies
npm install

# Run development server
npm run dev

Frontend: http://localhost:3000

Environment Variables

Copy .env.example files in backend/ and frontend/ directories. See .env.example for comprehensive documentation.

Backend (backend/.env)

# Debug Mode
DEBUG=false                          # Set true for verbose logging

# OpenAI (required for default config)
OPENAI_API_KEY=sk-...
EMBEDDING_MODEL=text-embedding-3-large
LLM_MODEL=gpt-4o-mini

# Alternative Embedding Providers (optional)
OLLAMA_BASE_URL=http://localhost:11434  # Local, no API key needed
JINA_API_KEY=...                         # Free tier: 1M tokens/mo
COHERE_API_KEY=...                       # Also used for reranking
VOYAGE_API_KEY=...                       # RAG optimized

# Database
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=semantic_search
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres

# ChromaDB
CHROMA_HOST=localhost
CHROMA_PORT=8000

# Reranking
RERANKER_PROVIDER=auto               # auto | jina | cohere
USE_RERANKING=true

Frontend (frontend/.env.local)

NEXT_PUBLIC_API_URL=http://localhost:8080/api/v1
NEXT_PUBLIC_DEBUG=false              # Set true for console logging

Project Structure

semantic-search-next/
├── docker-compose.yml           # PostgreSQL + ChromaDB + pgAdmin
├── docs/
│   ├── ARCHITECTURE.md          # Detailed system design
│   └── INFRASTRUCTURE.md        # Setup guide for all services
├── backend/
│   ├── .env.example             # Backend environment template
│   ├── app/
│   │   ├── main.py              # FastAPI entry
│   │   ├── config.py            # Settings
│   │   ├── api/v1/              # REST endpoints
│   │   │   ├── collections.py   # Collection CRUD
│   │   │   ├── documents.py     # Document upload/delete
│   │   │   ├── search.py        # Search with AI answers
│   │   │   ├── analytics.py     # Search analytics
│   │   │   ├── settings.py      # App settings
│   │   │   └── health.py        # Health check
│   │   ├── core/                # Business logic
│   │   │   ├── hybrid_retriever.py  # RRF fusion
│   │   │   ├── reranker.py      # Jina/Cohere reranking
│   │   │   ├── qa_chain.py      # RAG answer generation
│   │   │   ├── answer_verifier.py   # Citation verification
│   │   │   └── embeddings.py    # Multi-provider embeddings
│   │   ├── prompts/             # Externalized LLM prompts
│   │   │   ├── qa.yaml          # QA generation prompts
│   │   │   └── verification.yaml    # Verification prompts
│   │   ├── services/
│   │   │   └── retrieval.py     # HybridSearchService + BM25 cache
│   │   ├── db/
│   │   │   └── models.py        # SQLAlchemy models
│   │   └── api/
│   │       └── schemas.py       # Pydantic schemas
│   └── pyproject.toml
├── frontend/
│   ├── .env.example             # Frontend environment template
│   ├── src/
│   │   ├── app/                 # Next.js App Router
│   │   │   ├── page.tsx         # Main search page
│   │   │   ├── analytics/       # Search analytics dashboard
│   │   │   ├── documents/[id]/  # Document preview
│   │   │   ├── collections/     # Collection management
│   │   │   └── settings/        # Settings page
│   │   ├── components/
│   │   │   ├── ui/              # Shadcn components
│   │   │   ├── layout/          # Header, sidebar
│   │   │   ├── search/          # Search components
│   │   │   ├── analytics/       # Analytics charts
│   │   │   └── documents/       # Document viewer
│   │   ├── lib/
│   │   │   ├── api/             # API client & types
│   │   │   └── debug.ts         # Debug logging utility
│   │   └── hooks/               # TanStack Query hooks
│   ├── package.json
│   └── tsconfig.json
└── README.md

API Endpoints

Collections

POST   /api/v1/collections              Create collection
GET    /api/v1/collections              List collections
GET    /api/v1/collections/{id}         Get collection
PATCH  /api/v1/collections/{id}         Update collection
DELETE /api/v1/collections/{id}         Delete collection

Documents

POST   /api/v1/collections/{id}/documents   Upload document (invalidates BM25 cache)
GET    /api/v1/collections/{id}/documents   List documents
GET    /api/v1/documents/{id}               Get document
DELETE /api/v1/documents/{id}               Delete document (invalidates BM25 cache)

Search

POST   /api/v1/search                   Execute search with optional AI answer

Request:

{
  "query": "machine learning",
  "preset": "balanced",
  "top_k": 10,
  "collection_id": "optional-uuid",
  "generate_answer": true
}

Response:

{
  "query": "machine learning",
  "results": [...],
  "low_confidence_results": [...],
  "low_confidence_count": 3,
  "min_score_threshold": 0.35,
  "answer": "Machine learning is...",
  "answer_verification": {
    "confidence": "high",
    "citations": [...],
    "verified_claims": 3,
    "total_claims": 3,
    "coverage_percent": 100
  },
  "latency_ms": 245,
  "retrieval_method": "balanced"
}

Analytics

GET    /api/v1/analytics/searches       Search history (paginated)
GET    /api/v1/analytics/stats          Aggregate statistics
GET    /api/v1/analytics/trends         Time-series data

Evaluations

POST   /api/v1/evals/evaluate           Run LLM-as-Judge evaluation
GET    /api/v1/evals/results            List evaluation results
GET    /api/v1/evals/results/{id}       Get single evaluation
GET    /api/v1/evals/stats              Aggregate evaluation stats
GET    /api/v1/evals/providers          List available judge providers

Settings

GET    /api/v1/settings                 Get current settings
PATCH  /api/v1/settings                 Update settings
POST   /api/v1/settings/reset           Reset to defaults

Key Settings:

Setting Type Default Description
default_preset string balanced Retrieval preset
default_alpha float 0.5 Semantic vs BM25 weight
default_use_reranker bool true Enable reranking
default_top_k int 10 Results to return
min_score_threshold float 0.30 Low-confidence cutoff
default_generate_answer bool false Enable AI answer generation
default_context_window int 1 Chunks before/after for context
show_scores bool true Display score breakdown

Health

GET    /api/v1/health                   Health check

Search Result Scores

Each result includes a scores object:

{
  "scores": {
    "semantic_score": 0.85,    // Normalized 0-1 (cosine similarity)
    "bm25_score": 0.72,        // Normalized 0-1 (keyword match)
    "rerank_score": 0.92,      // Cross-encoder 0-1 (when enabled)
    "final_score": 0.92,       // Used for ranking/filtering
    "relevance_percent": 92    // Display value (0-100%)
  }
}

Development

Backend

cd backend
source .venv/bin/activate

# Lint
ruff check .

# Format
ruff format .

# Type check
mypy app

# Test
pytest

Frontend

cd frontend

# Lint
npm run lint

# Format
npm run format

# Build
npm run build

Test Queries

# High-confidence query
curl -s -X POST "http://localhost:8080/api/v1/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning", "preset": "balanced", "top_k": 5}'

# Low-confidence query (unrelated to docs)
curl -s -X POST "http://localhost:8080/api/v1/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "quantum entanglement physics", "preset": "balanced", "top_k": 10}'

# Check settings
curl -s http://localhost:8080/api/v1/settings

# Health check
curl -s http://localhost:8080/api/v1/health

Retrieval Presets

Preset Alpha Use Reranker Description
high_precision 0.8 true Emphasizes semantic similarity, best for specific queries
balanced 0.5 true Equal weight to semantic and keyword, good default
high_recall 0.3 true Emphasizes keyword matching, better for exploratory search

Known Considerations

  • BM25 Cache: Automatically invalidated when documents are uploaded/deleted
  • Confidence Threshold: Adjustable via Settings API (min_score_threshold)
  • Reranking: Falls back to Jina local model if Cohere unavailable

RAG Evaluations

Measure and improve your search quality with LLM-as-Judge evaluation. The system evaluates both retrieval quality (finding the right chunks) and answer quality (generating accurate responses).

Evaluation Metrics

Category Metric Description
Retrieval Context Relevance How relevant are the retrieved chunks?
Retrieval Context Precision Are irrelevant chunks filtered out?
Retrieval Context Coverage Is all needed information present?
Answer Faithfulness Is the answer grounded in the chunks?
Answer Answer Relevance Does it answer the question?
Answer Completeness Is anything missing?

Score Interpretation

Score Range Quality Action
> 0.8 Excellent System working well
0.6 - 0.8 Good Minor improvements possible
0.4 - 0.6 Moderate Review retrieval/generation settings
< 0.4 Poor Significant tuning needed

Judge Providers

Configure the evaluation LLM in Settings (/settings):

  • OpenAI - GPT-4o-mini (fast), GPT-4o (best quality)
  • Anthropic - Claude Sonnet 4, Claude Opus 4
  • Ollama - Llama 3.2, Llama 3.1 (local, free)

Learn More

Visit /learn-evals in the app for an interactive guide explaining evaluation concepts, when to use them, and how to act on results.

License

MIT License

About

A full-stack RAG (Retrieval Augmented Generation) application with hybrid search, cross-encoder reranking, citation-verified AI answers, and LLM-as-Judge evaluation. Supports multiple AI providers including OpenAI, Anthropic, and Ollama for fully local operation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •