Skip to content

A cloud hostable RAG (Retrieval-Augmented Generation) server built with LangChain 1.1, Google Gemini, and FastAPI.

License

Notifications You must be signed in to change notification settings

andynicholson/rag-and-bone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Rag and Bone Logo

Rag and Bone

Cloud hostable RAG server with Google Gemini, LangChain 1.1, and FastAPI

License: GPL v3 Python 3.13+ FastAPI LangChain Code style: black

A RAG (Retrieval-Augmented Generation) server designed for easy cloud deployment with comprehensive observability and professional tooling.

Features β€’ Quick Start β€’ Deployment β€’ API Docs β€’ Contributing


✨ Features

🧠 Core RAG Capabilities

  • Modern LangChain 1.1 LCEL - Clean, composable chains using LangChain Expression Language
  • Google Gemini Integration - Powered by Gemini 2.0 Flash and text-embedding-004
  • Persistent Vector Storage - ChromaDB with persistent storage (survives restarts)
  • Advanced Retrieval - MMR search with score thresholds and configurable k
  • Source Attribution - Responses include source documents with metadata

πŸš€ Server Features

  • Comprehensive Observability - Structured JSON logging + Prometheus metrics
  • Health Checks - /health and /ready endpoints for orchestration
  • Rate Limiting - Configurable per-IP rate limiting (60 req/min default)
  • Error Handling - Global error handling with graceful degradation
  • Configuration Management - Environment-based config with pydantic-settings
  • Input Validation - Pydantic models with length limits and sanitization

πŸ› οΈ Developer Experience

  • FastAPI + LangServe - Automatic /invoke, /batch, and /stream endpoints
  • Interactive API Docs - Auto-generated OpenAPI documentation at /docs
  • Structured Logging - JSON logs with request tracing and correlation IDs
  • Metrics - Prometheus-compatible metrics endpoint
  • Testing Suite - Comprehensive test script for all endpoints
  • GCP Deployment - Automated deployment scripts for Google Cloud Platform

πŸš€ Quick Start

Prerequisites

Local Development

# Clone the repository
git clone https://github.com/andynicholson/rag-and-bone.git
cd rag-and-bone

# Set up virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp env.example .env
# Edit .env and add your GOOGLE_API_KEY

# Run the server
python3 app.py

πŸŽ‰ Your server is now running at http://localhost:8000

Visit http://localhost:8000/docs for the interactive API documentation.

πŸ§ͺ Test the API

# Run comprehensive tests
./test-api.sh local

# Or test individual endpoints:

# Query with source attribution
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"input": "What is the answer to this question?", "include_sources": true}'

# Health check
curl http://localhost:8000/health?detailed=true

# Prometheus metrics
curl http://localhost:8000/metrics

☁️ Deployment

GCP Deployment

Click to expand deployment instructions

Prerequisites

  • Google Cloud account with billing enabled
  • gcloud CLI installed and configured
  • SSH key configured

Deploy Steps

# 1. Edit deployment configuration
vim deploy-config.sh  # Set VM_NAME, ZONE, etc.

# 2. Create VM (first time only)
./recreate-vm.sh

# 3. Deploy application
./deploy.sh

# 4. Test remote deployment
./test-api.sh remote

View Logs

# Real-time logs
gcloud compute ssh rag-server --zone=australia-southeast1-a \
  -- "sudo journalctl -u rag-server -f"

Debug Issues

./debug.sh  # Comprehensive diagnostics

See GCP_DEPLOYMENT_GUIDE.md for detailed instructions.

Docker Deployment

Docker support (coming soon)
# Build image
docker build -t rag-and-bone .

# Run container
docker run -p 8000:8000 --env-file .env rag-and-bone

πŸ“ Project Structure

rag-and-bone/
β”œβ”€β”€ πŸ“„ app.py                 # Main FastAPI application
β”œβ”€β”€ βš™οΈ  config.py              # Configuration management
β”œβ”€β”€ πŸ“ logging_config.py      # Structured logging
β”œβ”€β”€ 🎯 prompts.py             # Versioned prompt templates
β”œβ”€β”€ πŸ“Š metrics.py             # Prometheus metrics
β”œβ”€β”€ πŸ”§ middleware.py          # Rate limiting & error handling
β”œβ”€β”€ πŸ“₯ ingest.py              # Document ingestion CLI
β”œβ”€β”€ πŸ§ͺ load_sample_data.py   # Load sample docs for testing
β”œβ”€β”€ πŸ” inspect_chroma.py     # ChromaDB inspection tool
β”œβ”€β”€ πŸ“‹ requirements.txt       # Python dependencies
β”œβ”€β”€ 🌍 env.example            # Environment variables template
β”œβ”€β”€ πŸš€ deploy.sh             # Deployment automation
β”œβ”€β”€ πŸ› οΈ  recreate-vm.sh        # VM creation script
β”œβ”€β”€ πŸ› debug.sh              # Debugging helper
β”œβ”€β”€ πŸ§ͺ test-api.sh           # API testing script
β”œβ”€β”€ ⚑ startup-script.sh     # GCP VM initialization
β”œβ”€β”€ πŸ”§ rag-server.service    # Systemd service
β”œβ”€β”€ πŸ“– GCP_DEPLOYMENT_GUIDE.md
└── πŸ“œ LICENSE

πŸ—οΈ Architecture

RAG Pipeline Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Document   β”‚ --> β”‚   Embedding  β”‚ --> β”‚   ChromaDB    β”‚
β”‚   Loader    β”‚     β”‚ (text-embed) β”‚     β”‚ Vector Store  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                  β”‚
                                                  ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Response   β”‚ <-- β”‚  Gemini 2.0  β”‚ <-- β”‚   Retriever   β”‚
β”‚   (JSON)    β”‚     β”‚    Flash     β”‚     β”‚  (MMR Search) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LangChain LCEL Chain

rag_chain = (
    {"context": retriever | format_docs, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

🌐 API Endpoints

Health & Monitoring

Endpoint Method Description
/ GET Root endpoint with API information
/health GET Health check (add ?detailed=true for stats)
/metrics GET Prometheus metrics

Query Endpoints

Endpoint Method Description
/query POST Enhanced query with source attribution
/rag/invoke POST Single query (LangServe)
/rag/batch POST Multiple queries (LangServe)
/rag/stream POST Streaming response (LangServe)

Document Management

Endpoint Method Description
/ingest POST Ingest a document

Documentation

Endpoint Method Description
/docs GET Interactive OpenAPI documentation

βš™οΈ Configuration

Environment Variables

Variable Description Default
GOOGLE_API_KEY Google Gemini API key Required
ENVIRONMENT Environment name development
LOG_LEVEL Logging level INFO
RATE_LIMIT_PER_MINUTE Rate limit per IP 60
CHROMA_PERSIST_DIRECTORY ChromaDB storage path ./chroma_db

See env.example for all configuration options.

Models

  • Embeddings: models/text-embedding-004
  • LLM: gemini-2.0-flash

πŸ› οΈ Development

Code Formatting

# Format code
black app.py

# Sort imports
isort app.py

VS Code/Cursor will auto-format on save (configured in pyproject.toml).

Adding Documents

Load Sample Data (for testing):

# Load sample documents
python3 load_sample_data.py

# Reset collection and load sample data
python3 load_sample_data.py --reset

Using CLI:

python3 ingest.py /path/to/documents --name "My Docs"

Using API:

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"content": "Document text", "metadata": {"source": "manual"}}'

Inspecting ChromaDB

# Show statistics
python3 inspect_chroma.py stats

# List all documents
python3 inspect_chroma.py list

# Search for documents
python3 inspect_chroma.py search "your query"

# Filter by metadata
python3 inspect_chroma.py filter --field category --value benefits

πŸ”’ Security Considerations

Rate Limiting

⚠️ The server includes in-memory rate limiting (60 requests/minute per IP by default). Note that this state resets on server restart. For production deployments, consider implementing persistent rate limiting with Redis.

CORS Configuration

⚠️ The default configuration allows all origins (CORS_ORIGINS=["*"]). For production:

CORS_ORIGINS=["https://yourdomain.com","https://app.yourdomain.com"]

Input Validation

API endpoints enforce maximum input lengths:

  • Query input: 2,000 characters
  • Document ingestion: 50,000 characters

Adjust these limits in app.py based on your requirements.


πŸ’° Cost Estimates

GCP VM

Instance Type Monthly Cost Recommended For
e2-medium ~$24 Production workloads
e2-small ~$12 Development/testing

Gemini API

  • Free tier available with rate limits
  • Pay-per-use after free tier
  • See Google AI Pricing

πŸ› Troubleshooting

Common Issues

Server won't start

# Check if port is already in use
lsof -i :8000

# Check logs
tail -f logs/rag-server.log

ChromaDB errors

# Inspect database
python3 inspect_chroma.py stats

# Reset database (WARNING: deletes all data)
rm -rf ./chroma_db

# Optionally load sample data for testing
python3 load_sample_data.py
# Or reset and load sample data
python3 load_sample_data.py --reset

Gemini API errors

  • Check your API key is set correctly in .env
  • Verify you haven't exceeded free tier limits
  • Check Google AI Status

See GCP_DEPLOYMENT_GUIDE.md for comprehensive troubleshooting.


πŸ“„ License

This project is dual-licensed:

Copyright

Copyright (C) 2026 A P Nicholson intothemist@gmail.com


🀝 Contributing

Contributions are welcome! Here's how:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow Black code style
  • Add tests for new features
  • Update documentation as needed
  • Use conventional commits (feat:, fix:, docs:, etc.)

πŸ™ Acknowledgments

Built with:


πŸ“¬ Contact

A P Nicholson - intothemist@gmail.com

Project Link: https://github.com/andynicholson/rag-and-bone


⬆ Back to Top

Made with ❀️ and β˜•

About

A cloud hostable RAG (Retrieval-Augmented Generation) server built with LangChain 1.1, Google Gemini, and FastAPI.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published