Make the most of your pantry with a click of a button.
A chatbot-style recipe finder powered by hybrid search (BM25 + Semantic) with cross-encoder reranking. Ask naturally about what you want to cook!
Three-phase RAG system with rigorous RAGAS evaluation:
- Ingestion: PDFs → text extraction → chunking → local embeddings → ChromaDB
- Query Expansion: User query → LLM adds synonyms/keywords → enriched search query
- Hybrid Retrieval: BM25 (keyword) + Semantic (embeddings) → RRF fusion → cross-encoder reranking
- Generation: Context → Gemini LLM → formatted recipe with dietary adaptation
- Query Expansion: LLM-based keyword extraction and synonym generation
- Embeddings: HuggingFace
all-MiniLM-L6-v2(local, 384-dim) - Hybrid Search: BM25 + Semantic with Reciprocal Rank Fusion (RRF)
- Reranking: Cross-encoder
ms-marco-MiniLM-L-6-v2(+6.1% answer quality) - Vector Store: ChromaDB (local SQLite persistence)
- LLM: Google Gemini 2.5 Flash
- Evaluation: RAGAS framework (Groq llama-3.1-8b for metrics)
- Frontend: Streamlit with Space Grotesk font and Soft Sage theme
- Query Expansion: LLM expands user queries with synonyms and related terms for better matching
- Hybrid Search: Combines keyword matching (BM25) with semantic understanding
- Cross-Encoder Reranking: Improves answer relevancy from 53.4% → 59.5%
- Batch Ingestion: Process entire folders of PDFs with duplicate detection
- Modular Architecture: Shared
core/module with centralized config - RAGAS Evaluation: 4 metrics (Context Precision, Context Recall, Faithfulness, Answer Relevancy)
- Unit Tests: pytest suite for core retrieval algorithms
- Perfect Recall: 100% - users won't miss relevant recipes
- Debug Mode: View raw retrieval results without LLM generation
- Python 3.10+ (3.9 has compatibility issues with
sentence-transformers) - Google AI API key (https://ai.google.dev/) - for recipe generation
- Groq API key (https://console.groq.com/) - optional, for RAGAS evaluation only
A sample cookbook is included: Good and Cheap by Leanne Brown (Creative Commons BY-NC-SA). No additional data needed.
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
echo "GOOGLE_API_KEY=<your-key>" > .env
echo "GROQ_API_KEY=<your-key>" >> .env # Optional: for evaluation only
# Ingest the included cookbook (or add your own PDFs to data/)
python ingest.py # Default: Good and Cheap cookbook
python ingest.py data/cookbook.pdf # Single file
python ingest.py data/ # All PDFs in folder (batch mode)
# Run Streamlit app
streamlit run app.py
# Or run REST API
uvicorn api:app --reload- Navigate to http://localhost:8501
- Describe what you want to cook naturally (e.g., "I have chicken, rice, and some vegetables")
- Optionally specify dietary restrictions (e.g., "gluten-free, vegetarian")
- Click "Find recipes"
- Receive a formatted recipe with:
- Ingredients list with measurements
- Step-by-step instructions
- Chef's notes with substitutions and "See also" alternatives
- Source citations from the cookbook
Debug mode: Add ?debug=true to URL to view raw retrieval results without LLM generation.
Start the API server:
uvicorn api:app --reloadEndpoints:
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check — shows component status |
| POST | /search |
Search recipes, returns raw chunks |
| POST | /generate |
Search + LLM generation, returns formatted recipe |
Interactive docs: http://localhost:8000/docs (Swagger UI)
Example request:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "I have chicken and rice, what can I make?", "use_reranking": true}'# Requires Groq API key in .env
python evaluation.py # Run with default output
python evaluation.py --verbose # Show debug output
python evaluation.py --quiet # Only show errors
# Generates:
# - evaluation_results.csv (raw scores)
# - evaluation_results_report.txt (summary)Evaluates 15 test cases across 5 categories (ingredients, dietary, budget, cuisine, meal type) against 4 retrieval methods.
smart-pantry/
├── core/ # Shared utilities module
│ ├── __init__.py # Package exports
│ ├── config.py # Centralized configuration (single source of truth)
│ ├── embeddings.py # Embedding model + vectorstore initialization
│ ├── retrieval.py # RRF fusion, cross-encoder reranking (cached), search
│ └── llm.py # LLM initialization, recipe generation, prompt template
├── tests/ # pytest test suite (26 tests)
│ ├── conftest.py # Shared fixtures (mock recipe data)
│ ├── test_retrieval.py # Tests for RRF and format_context
│ ├── test_ingest.py # Tests for PDF discovery and document splitting
│ └── test_search.py # Tests for hybrid search pipeline (mocked)
├── .streamlit/config.toml # Soft Sage theme configuration
├── app.py # PantryPal Streamlit UI (Space Grotesk font, centered layout)
├── ingest.py # PDF processing + batch ingestion + deduplication
├── evaluation.py # RAGAS evaluation framework (imports from core only)
├── requirements.txt # Python dependencies
├── pytest.ini # Test configuration
├── .env # API credentials (gitignored)
├── data/ # Source PDFs (gitignored)
├── chroma_db/ # Vector database (gitignored)
├── CLAUDE.md # Development documentation
├── EVALUATION_RESULTS.md # RAGAS evaluation findings
└── README.md # This file
All configuration is centralized in core/config.py (single source of truth):
EMBEDDING_MODEL: "all-MiniLM-L6-v2" (shared across all modules)CHUNK_SIZE: 2000 /CHUNK_OVERLAP: 200CHUNK_SEPARATORS:["\n\n", "Title:", "Ingredients:"]
NUM_RESULTS: 5 (final results returned)RERANK_TOP_K: 10 (candidates for reranking)CROSS_ENCODER_MODEL: "cross-encoder/ms-marco-MiniLM-L-6-v2"
GEMINI_MODEL: "gemini-flash-latest"GROQ_MODEL: "llama-3.1-8b-instant"- Evaluates 4 methods: Semantic Only, BM25 Only, Hybrid (RRF), Hybrid + Reranking
# Run all tests
python -m pytest tests/ -v
# Run specific test class
python -m pytest tests/test_retrieval.py::TestReciprocalRankFusion -v26 unit tests covering RRF fusion, context formatting, PDF ingestion, and hybrid search.
| Error | Solution |
|---|---|
Vector database not found |
Run python ingest.py |
GOOGLE_API_KEY not found |
Create .env file |
404 models/gemini-* not found |
Use gemini-flash-latest |
Cannot copy out of meta tensor |
pip install sentence-transformers==3.0.1 |
| Python 3.9 import errors | Upgrade to Python 3.10+ |
- Ingestion: ~2s for 90-page PDF
- Query latency: 2-5s (search + generation), +200ms with reranking
- Storage: ~3MB per cookbook
- Answer Relevancy: 59.5% (with reranking, +6.1% improvement)
- Context Recall: 100% (perfect - won't miss recipes)
- Faithfulness: 88.4% (high answer grounding)
- Context Precision: 66.8% (acceptable noise/signal ratio)
See EVALUATION_RESULTS.md for detailed analysis.
- Embeddings: $0 (local)
- Vector DB: $0 (local)
- Generation: ~$0.0001/query (Gemini free tier: 1500 req/day)
- Total: Free for typical usage
Google's Embedding API hit rate limits (429 errors) immediately on the free tier. Since embeddings are a bulk operation during ingestion, local HuggingFace models (all-MiniLM-L6-v2, 384-dim) provide unlimited free embedding at the cost of slightly lower dimensionality vs Google's 768-dim — a worthwhile tradeoff for recipe matching.
Semantic search understands meaning ("poultry" matches "chicken") but can miss exact ingredient names. BM25 keyword search catches exact terms ("2 cups flour") but misses synonyms. Combining both with Reciprocal Rank Fusion (RRF) achieves 100% recall — users never miss relevant recipes — while cross-encoder reranking on top brings answer relevancy to its highest at 59.5%.
Without measurement, "better" is just a feeling. The RAGAS evaluation framework compares 4 retrieval methods across 4 metrics, providing data-backed evidence that hybrid + reranking is the best approach. This caught a non-obvious finding: pure semantic search has higher precision (79.6% vs 66.8%) but lower final answer quality (58.7% vs 59.5%) — the LLM filters noisy context effectively.
MIT