A production-ready Enterprise Knowledge Assistant using advanced RAG (Retrieval-Augmented Generation) architecture. The system ingests internal company documents (PDFs, emails, Confluence pages, Google Docs) and provides accurate, cited answers to employee questions.
- Query Construction: Natural language β optimized database queries
- Query Translation: HyDE, multi-query, decomposition techniques
- Routing: Determine optimal retrieval path (vector/relational/graph)
- Indexing: Semantic chunking, multi-representation indexing
- Retrieval: Vector search + re-ranking + active retrieval
- Generation: LLM synthesis with Self-RAG capabilities
- Feedback Loop: Quality assessment and iterative improvement
- Framework: FastAPI with Uvicorn ASGI server
- Runtime: Python 3.12 with UV package manager
- Data Validation: Pydantic v2 models
- Database ORM: SQLAlchemy 2.0 (async)
- Task Queue: Celery with Redis broker
- Vector DB: Qdrant with HNSW indexing
- Relational DB: PostgreSQL 15+
- Cache: Redis
- Framework: Next.js 14 with App Router
- Styling: Tailwind CSS
- State Management: Zustand
- Data Fetching: React Query (TanStack Query)
- Primary LLM: OpenAI GPT-3.5-Turbo
- Fallback LLM: GPT-4 for complex queries
- Embedding Model: SentenceTransformers
all-MiniLM-L6-v2(384-dim) - RAG Framework: LangChain for orchestration
- Observability: LangSmith for tracing/monitoring
- Docker and Docker Compose
- Python 3.12+
- Node.js 20+
- UV package manager (
pip install uv) - OpenAI API key
git clone <repository-url>
cd Enterprise-RAG-SystemCopy .env.example to .env and configure:
cp .env.example .envEdit .env with your settings:
OPENAI_API_KEY=sk-your-key-here
LANGSMITH_API_KEY=ls-your-key-here # Optional
POSTGRES_URL=postgresql+asyncpg://raguser:ragpass@localhost:5432/ragdb
QDRANT_URL=http://localhost:6333
REDIS_URL=redis://localhost:6379docker-compose up -dThis will start:
- PostgreSQL (port 5432)
- Qdrant (ports 6333, 6334)
- Redis (port 6379)
- Backend API (port 8000)
- Celery worker
cd backend
uv pip install -e .cd frontend
npm install
npm run devFrontend will be available at http://localhost:3000
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Qdrant Dashboard: http://localhost:6333/dashboard
Enterprise-RAG-System/
βββ backend/
β βββ src/
β β βββ api/ # FastAPI routes and models
β β βββ core/ # Configuration
β β βββ services/ # Business logic
β β β βββ document/ # Document processing
β β β βββ embeddings/ # Embedding generation
β β β βββ vector/ # Qdrant operations
β β β βββ retrieval/ # Retrieval logic
β β β βββ generation/ # LLM integration
β β β βββ query/ # Query optimization
β β βββ database/ # SQLAlchemy models
β β βββ utils/ # Utilities
β βββ pyproject.toml
β βββ Dockerfile
βββ frontend/
β βββ app/ # Next.js app router
β βββ components/ # React components
β βββ lib/ # Utilities and API client
β βββ package.json
βββ docker-compose.yml
βββ .env.example
βββ README.md
- HyDE (Hypothetical Document Embeddings): Generate hypothetical answers to improve retrieval
- Multi-Query Generation: Create 3-5 query variations for better recall
- Query Decomposition: Break complex questions into sub-queries
- RAG-Fusion: Combine results from multiple query variations
- Cross-Encoder Re-ranking: Use
cross-encoder/ms-marco-MiniLM-L-6-v2for result refinement - Hierarchical Retrieval: Summary β detail retrieval pattern
- Citation Management: Automatic source attribution
- Confidence Scoring: Estimate answer reliability
- Streaming Responses: Real-time answer generation
POST /api/chat- Send a chat messagePOST /api/chat/stream- Stream chat response
GET /api/documents- List all documentsPOST /api/documents- Upload a documentGET /api/documents/{id}- Get document detailsDELETE /api/documents/{id}- Delete a document
GET /api/health- Health check
cd backend
pytestcd backend
black src/
ruff check src/cd backend
alembic revision --autogenerate -m "description"
alembic upgrade head- LangSmith: LLM tracing and monitoring (if configured)
- Prometheus: System metrics (to be configured)
- Grafana: Dashboards (to be configured)
- Environment variables for sensitive data
- JWT authentication (to be implemented)
- CORS configuration
- Rate limiting (to be implemented)
- Latency: < 3 seconds for end-to-end response
- Accuracy: High answer correctness (RAGAS evaluation)
- Uptime: 99.9% availability target
- Basic document upload and chunking
- Simple embedding with SentenceTransformers
- Qdrant setup and basic vector search
- FastAPI endpoints for chat and documents
- Next.js basic chat interface
- Advanced chunking strategies
- Query optimization (HyDE implementation)
- Re-ranking with cross-encoders
- Improved prompt engineering
- Basic citation management
- Multi-query and RAG-Fusion
- Self-RAG capabilities
- Semantic routing
- Active retrieval mechanisms
- Comprehensive monitoring
- Scalability improvements
- Security hardening
- Performance optimization
- Comprehensive testing
- Deployment automation
See LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request.
For issues and questions, please open a GitHub issue.