Retrieval-augmented generation stack purpose-built for Gilles Deleuze's corpus. The system extracts long-form passages from the PDF books, indexes them in Qdrant, plans dense search queries, and answers questions through Claude while exposing the full trace (terms, passages, quotes, thinking) in a simple web UI.
- macOS / Linux with
uvinstalled (pip install uvonce, then reuse) - Python 3.11+ (managed automatically by
uv) - Qdrant (cloud URL/API key or local persistence)
- Anthropic API key for Claude Opus (planner + answer)
- Hugging Face Inference endpoint(s) for BGE embeddings and reranker, or local GPU with
sentence-transformers
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
cp config/example.env .env # fill in the secretsconfig/example.env lists the minimal variables:
- LLMs:
ANTHROPIC_API_KEY,ANSWER_PROVIDER,ANSWER_MODEL,ANSWER_MAX_TOKENS - Planner:
QUERY_PLANNER_PROVIDER,QUERY_PLANNER_MODEL,QUERY_PLANNER_USE_LEXICON,PLANNER_VERBOSE - Embeddings/Reranker:
HF_API_KEY,HF_EMBEDDING_ENDPOINT,HF_RERANKER_ENDPOINT,EMBED_MODEL_NAME,EMBED_DIM,RERANKER_MODEL,ENABLE_LOCAL_RERANKER - Vector store:
QDRANT_URL,QDRANT_API_KEY,QDRANT_COLLECTION - Retrieval tuning:
INITIAL_CANDIDATE_COUNT,RERANK_TOP_K,RETRIEVER_MQ_TERMS,PRF_ENABLE,PRF_TOP_P,PRF_TERMS,PASSAGE_TOKENS,PASSAGE_MAX_TOKENS,RETRIEVAL_CACHE_TTL,SESSIONS_DIR - Observability:
LOG_LEVEL
Set optional keys (OPENAI_API_KEY, KIMI_API_KEY) if you plan to swap providers; otherwise they can be omitted.
-
Extract structured text from PDFs
uv run python -m src.pipeline.pdf_extractor
PDFs are read from
data/raw/pdf_books/and snapshots write todata/processed/by default. Override with--pdf-dirand--output-dirif needed. -
Embed and index in Qdrant
uv run python -m src.pipeline.embed_corpus --snapshot data/processed/deleuze_corpus_<date>.jsonl
Use
--recreateto rebuild the collection and--max-recordsfor smoke tests. Embeddings respect Apple Silicon (torch.mps) if available.
Snapshots and manifest files are stored under data/processed/; the Qdrant local persistence lives in data/qdrant/ when no remote URL is provided.
uv run uvicorn src.main:app --host 0.0.0.0 --port 8000POST /upload-pdf/ingests a new PDF into the indexPOST /plan/returns the latest query-plan termsPOST /ask/returns answer, citations, thinking trace, passages, and raw search termsGET /serves the chat interface fromstatic/chat.html
Console logs include the planner terms, PRF terms, reranker activity, and Claude's full analysis (search terms + thinking).
src/evaluation/retrieval_eval.py measures hit rate / MRR over eval/dataset.jsonl. Run with:
uv run python -m src.evaluation.retrieval_evalsrc/main.py– FastAPI entrypoint and web routessrc/rag/– ingestion, hybrid retrieval, passage builder, vector storesrc/llm/– planner & answer clients (Anthropic/Kimi/OpenAI)src/pipeline/– PDF extraction and embedding utilitiessrc/observability/– logging and tracing helpersstatic/– web UI assetstests/– smoke tests for retrieval evaluation
With unused training scripts and legacy vector stores removed, only the modules above are required for the current RAG flow.