Cloud hostable RAG server with Google Gemini, LangChain 1.1, and FastAPI
A RAG (Retrieval-Augmented Generation) server designed for easy cloud deployment with comprehensive observability and professional tooling.
Features β’ Quick Start β’ Deployment β’ API Docs β’ Contributing
- Modern LangChain 1.1 LCEL - Clean, composable chains using LangChain Expression Language
- Google Gemini Integration - Powered by Gemini 2.0 Flash and text-embedding-004
- Persistent Vector Storage - ChromaDB with persistent storage (survives restarts)
- Advanced Retrieval - MMR search with score thresholds and configurable k
- Source Attribution - Responses include source documents with metadata
- Comprehensive Observability - Structured JSON logging + Prometheus metrics
- Health Checks -
/healthand/readyendpoints for orchestration - Rate Limiting - Configurable per-IP rate limiting (60 req/min default)
- Error Handling - Global error handling with graceful degradation
- Configuration Management - Environment-based config with pydantic-settings
- Input Validation - Pydantic models with length limits and sanitization
- FastAPI + LangServe - Automatic
/invoke,/batch, and/streamendpoints - Interactive API Docs - Auto-generated OpenAPI documentation at
/docs - Structured Logging - JSON logs with request tracing and correlation IDs
- Metrics - Prometheus-compatible metrics endpoint
- Testing Suite - Comprehensive test script for all endpoints
- GCP Deployment - Automated deployment scripts for Google Cloud Platform
- Python 3.13+
- Google Gemini API key (Get one here)
# Clone the repository
git clone https://github.com/andynicholson/rag-and-bone.git
cd rag-and-bone
# Set up virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp env.example .env
# Edit .env and add your GOOGLE_API_KEY
# Run the server
python3 app.pyπ Your server is now running at http://localhost:8000
Visit http://localhost:8000/docs for the interactive API documentation.
# Run comprehensive tests
./test-api.sh local
# Or test individual endpoints:
# Query with source attribution
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"input": "What is the answer to this question?", "include_sources": true}'
# Health check
curl http://localhost:8000/health?detailed=true
# Prometheus metrics
curl http://localhost:8000/metricsClick to expand deployment instructions
- Google Cloud account with billing enabled
gcloudCLI installed and configured- SSH key configured
# 1. Edit deployment configuration
vim deploy-config.sh # Set VM_NAME, ZONE, etc.
# 2. Create VM (first time only)
./recreate-vm.sh
# 3. Deploy application
./deploy.sh
# 4. Test remote deployment
./test-api.sh remote# Real-time logs
gcloud compute ssh rag-server --zone=australia-southeast1-a \
-- "sudo journalctl -u rag-server -f"./debug.sh # Comprehensive diagnosticsSee GCP_DEPLOYMENT_GUIDE.md for detailed instructions.
Docker support (coming soon)
# Build image
docker build -t rag-and-bone .
# Run container
docker run -p 8000:8000 --env-file .env rag-and-bonerag-and-bone/
βββ π app.py # Main FastAPI application
βββ βοΈ config.py # Configuration management
βββ π logging_config.py # Structured logging
βββ π― prompts.py # Versioned prompt templates
βββ π metrics.py # Prometheus metrics
βββ π§ middleware.py # Rate limiting & error handling
βββ π₯ ingest.py # Document ingestion CLI
βββ π§ͺ load_sample_data.py # Load sample docs for testing
βββ π inspect_chroma.py # ChromaDB inspection tool
βββ π requirements.txt # Python dependencies
βββ π env.example # Environment variables template
βββ π deploy.sh # Deployment automation
βββ π οΈ recreate-vm.sh # VM creation script
βββ π debug.sh # Debugging helper
βββ π§ͺ test-api.sh # API testing script
βββ β‘ startup-script.sh # GCP VM initialization
βββ π§ rag-server.service # Systemd service
βββ π GCP_DEPLOYMENT_GUIDE.md
βββ π LICENSE
βββββββββββββββ ββββββββββββββββ βββββββββββββββββ
β Document β --> β Embedding β --> β ChromaDB β
β Loader β β (text-embed) β β Vector Store β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββ
β
β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββ
β Response β <-- β Gemini 2.0 β <-- β Retriever β
β (JSON) β β Flash β β (MMR Search) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββ
rag_chain = (
{"context": retriever | format_docs, "input": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Root endpoint with API information |
/health |
GET | Health check (add ?detailed=true for stats) |
/metrics |
GET | Prometheus metrics |
| Endpoint | Method | Description |
|---|---|---|
/query |
POST | Enhanced query with source attribution |
/rag/invoke |
POST | Single query (LangServe) |
/rag/batch |
POST | Multiple queries (LangServe) |
/rag/stream |
POST | Streaming response (LangServe) |
| Endpoint | Method | Description |
|---|---|---|
/ingest |
POST | Ingest a document |
| Endpoint | Method | Description |
|---|---|---|
/docs |
GET | Interactive OpenAPI documentation |
| Variable | Description | Default |
|---|---|---|
GOOGLE_API_KEY |
Google Gemini API key | Required |
ENVIRONMENT |
Environment name | development |
LOG_LEVEL |
Logging level | INFO |
RATE_LIMIT_PER_MINUTE |
Rate limit per IP | 60 |
CHROMA_PERSIST_DIRECTORY |
ChromaDB storage path | ./chroma_db |
See env.example for all configuration options.
- Embeddings:
models/text-embedding-004 - LLM:
gemini-2.0-flash
# Format code
black app.py
# Sort imports
isort app.pyVS Code/Cursor will auto-format on save (configured in pyproject.toml).
Load Sample Data (for testing):
# Load sample documents
python3 load_sample_data.py
# Reset collection and load sample data
python3 load_sample_data.py --resetUsing CLI:
python3 ingest.py /path/to/documents --name "My Docs"Using API:
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"content": "Document text", "metadata": {"source": "manual"}}'# Show statistics
python3 inspect_chroma.py stats
# List all documents
python3 inspect_chroma.py list
# Search for documents
python3 inspect_chroma.py search "your query"
# Filter by metadata
python3 inspect_chroma.py filter --field category --value benefitsCORS_ORIGINS=["*"]). For production:
CORS_ORIGINS=["https://yourdomain.com","https://app.yourdomain.com"]API endpoints enforce maximum input lengths:
- Query input: 2,000 characters
- Document ingestion: 50,000 characters
Adjust these limits in app.py based on your requirements.
| Instance Type | Monthly Cost | Recommended For |
|---|---|---|
e2-medium |
~$24 | Production workloads |
e2-small |
~$12 | Development/testing |
- Free tier available with rate limits
- Pay-per-use after free tier
- See Google AI Pricing
Common Issues
# Check if port is already in use
lsof -i :8000
# Check logs
tail -f logs/rag-server.log# Inspect database
python3 inspect_chroma.py stats
# Reset database (WARNING: deletes all data)
rm -rf ./chroma_db
# Optionally load sample data for testing
python3 load_sample_data.py
# Or reset and load sample data
python3 load_sample_data.py --reset- Check your API key is set correctly in
.env - Verify you haven't exceeded free tier limits
- Check Google AI Status
See GCP_DEPLOYMENT_GUIDE.md for comprehensive troubleshooting.
This project is dual-licensed:
- Open Source: GNU General Public License v3.0 - Free for open source projects
- Commercial: Contact intothemist@gmail.com for commercial licensing options
Copyright (C) 2026 A P Nicholson intothemist@gmail.com
Contributions are welcome! Here's how:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow Black code style
- Add tests for new features
- Update documentation as needed
- Use conventional commits (feat:, fix:, docs:, etc.)
Built with:
- LangChain - RAG framework
- FastAPI - Web framework
- Google Gemini - LLM and embeddings
- ChromaDB - Vector database
A P Nicholson - intothemist@gmail.com
Project Link: https://github.com/andynicholson/rag-and-bone
Made with β€οΈ and β