Enterprise-grade RAG (Retrieval-Augmented Generation) system with specialized AI agents, built with modern technologies and best practices.
A production-ready AI workspace featuring intelligent document processing, semantic search, and four specialized agents for research, analysis, writing, and code generation.
Real-time metrics and analytics overview
Multi-format document upload and processing
Chat interface with specialized AI agents
-
π Intelligent Document Processing
- Multi-format support (PDF, DOCX, TXT, MD, CSV)
- Smart chunking strategies (recursive, semantic, markdown-aware, code-aware)
- Automatic metadata extraction and preservation
- Asynchronous processing with progress tracking
-
π€ Specialized AI Agents
- Research Agent - Expert at finding and synthesizing information
- Analysis Agent - Data analysis and insight generation
- Writer Agent - Professional content creation
- Code Agent - Code generation and technical documentation
-
π Advanced Search
- Semantic search with Qdrant vector database
- Hybrid search capabilities (dense + sparse)
- Metadata filtering and reranking
- Multi-query search strategies
-
π¬ Interactive Chat Interface
- Real-time streaming responses
- WebSocket support for live updates
- Agent selection for specialized tasks
- Markdown rendering with syntax highlighting
-
π Analytics & Monitoring
- Real-time performance metrics
- Usage tracking and statistics
- Structured logging with context
- Health checks and observability
-
π Model Context Protocol (MCP)
- Standard protocol for AI tool integration
- 5 specialized tools exposed via MCP
- Easy integration with external systems
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (React) β
β Dashboard | Documents | Chat | Agents | Analytics β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
β REST API / WebSocket
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI) β
β ββββββββββββββ ββββββββββββββ βββββββββββββββββββββββ β
β β Document β β Agent β β MCP Server β β
β β Processor β βOrchestratorβ β (Tools) β β
β ββββββββββββββ ββββββββββββββ βββββββββββββββββββββββ β
βββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β β
β β
βββββββββΌβββββββ ββββββΌβββββββββββββββββββββββββββββββ
β Qdrant β β OpenAI API β
β (Vectors) β β (Embeddings & Agents) β
ββββββββββββββββ βββββββββββββββββββββββββββββββββββββ
- FastAPI - Modern, high-performance web framework
- LangChain - LLM orchestration and agent framework
- Qdrant - High-performance vector database
- Pydantic - Data validation and settings management
- Structlog - Structured logging for observability
- PyPDF2, python-docx - Document parsing
- React 18 - Modern UI library
- TypeScript - Type-safe JavaScript
- Tailwind CSS - Utility-first CSS framework
- Zustand - Lightweight state management
- React Query - Server state management
- React Markdown - Markdown rendering
- Docker & Docker Compose - Containerization
- Nginx - Reverse proxy and static file serving
- Redis (optional) - Caching layer
- Python 3.11+
- Node.js 18+
- Docker & Docker Compose
- OpenAI API Key
git clone git@github.com:kaninstein/workspaceai.git
cd workspaceaicp .env.example .envEdit .env and add your credentials:
# Required
OPENAI_API_KEY=sk-your-openai-key-here
SECRET_KEY=your-secret-key-for-jwt
# Optional (defaults provided)
QDRANT_HOST=localhost
QDRANT_PORT=6333
DATABASE_URL=sqlite:///./ia_workspace.db# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose downBackend:
cd backend
# Install dependencies
pip install -r requirements.txt
# Start Qdrant (in separate terminal)
docker run -p 6333:6333 qdrant/qdrant
# Run backend
uvicorn app.main:app --reload --port 8000Frontend:
cd frontend
# Install dependencies
npm install
# Run development server
npm run dev- Frontend: http://localhost:5173
- API Docs: http://localhost:8000/docs
- Qdrant Dashboard: http://localhost:6333/dashboard
This system solves real business problems across multiple industries. Here's how companies can leverage it:
Problem: Support teams waste time searching through documentation, SOPs, and past tickets.
Solution:
Documents: Upload KB articles, product manuals, troubleshooting guides, FAQs
Agent: Research Agent
Query: "How do I reset a user's password in the enterprise dashboard?"
Result: Instant answers with citations from indexed documentation
Business Impact:
- 70% faster ticket resolution
- Consistent answers across support team
- New hires productive in days, not weeks
Problem: Law firms spend hours reviewing contracts, case files, and legal precedents.
Solution:
Documents: Upload contracts, case law, compliance documents
Agent: Analysis Agent
Query: "Identify all liability clauses in these vendor contracts and compare terms"
Result: Structured analysis with risk assessment and comparisons
Business Impact:
- 80% reduction in document review time
- Identify hidden risks automatically
- Scale legal review without hiring more associates
Problem: R&D teams struggle to synthesize insights from research papers, patents, and test results.
Solution:
Documents: Upload research papers, patents, lab reports, technical specs
Agent: Research Agent
Query: "What battery technologies show promise for EVs based on recent papers?"
Result: Synthesized insights from multiple sources with references
Business Impact:
- Accelerate literature review from weeks to hours
- Surface hidden connections across research
- Make data-driven R&D decisions
Problem: Engineers waste time searching wikis, Confluence pages, and outdated docs.
Solution:
Documents: Upload API docs, architecture diagrams, runbooks, README files
Agent: Code Agent
Query: "How do I implement OAuth2 authentication in our microservices?"
Result: Step-by-step guide with code examples from your own docs
Business Impact:
- 50% reduction in "How do I...?" questions
- Onboard developers 3x faster
- Keep tribal knowledge accessible
Problem: Finance teams manually review hundreds of invoices, reports, and regulatory documents.
Solution:
Documents: Upload financial reports, invoices, audit trails, regulations
Agent: Analysis Agent
Query: "Analyze Q4 expense reports and flag anomalies over $10K"
Result: Automated analysis with anomaly detection and compliance checks
Business Impact:
- Detect fraud and errors automatically
- Ensure regulatory compliance
- Close books 60% faster
Problem: HR teams manually screen resumes and answer repetitive policy questions.
Solution:
Documents: Upload resumes, job descriptions, company policies, benefits docs
Agent: Research + Analysis Agents
Query: "Find candidates with 5+ years Python and AWS experience"
Result: Ranked candidates with match scores and reasoning
Business Impact:
- Screen 100 resumes in minutes
- Reduce bias in initial screening
- Instant answers to employee policy questions
Problem: Marketing teams struggle to maintain brand consistency and optimize content.
Solution:
Documents: Upload brand guidelines, competitor content, keyword research, past campaigns
Agent: Writer Agent
Query: "Write a blog post about AI automation following our brand voice"
Result: On-brand content using insights from uploaded materials
Business Impact:
- 10x content production speed
- Consistent brand voice across channels
- SEO-optimized content using your own data
Problem: Doctors spend hours reviewing patient histories and medical literature.
Solution:
Documents: Upload patient records, medical journals, treatment protocols
Agent: Research Agent
Query: "Summarize this patient's cardiac history and recommend screening tests"
Result: Comprehensive summary with evidence-based recommendations
Business Impact:
- More time with patients, less with paperwork
- Evidence-based treatment decisions
- Reduce medical errors
Problem: QA teams manually inspect reports and cross-reference specifications.
Solution:
Documents: Upload quality reports, specifications, inspection logs, SOPs
Agent: Analysis Agent
Query: "Identify defect patterns in last month's production reports"
Result: Root cause analysis with trend identification
Business Impact:
- Predict quality issues before they escalate
- Reduce defect rates by 40%
- Optimize production processes
Problem: Trainers manually create materials and answer repetitive questions.
Solution:
Documents: Upload textbooks, course materials, assessments, student questions
Agent: Writer + Research Agents
Query: "Create a quiz on Chapter 5 with explanations for answers"
Result: Automated assessment generation with detailed feedback
Business Impact:
- Personalized learning at scale
- Instant answers to student questions 24/7
- Reduce trainer workload by 50%
-
Upload Documents
- Navigate to Documents page
- Drag and drop files (PDF, DOCX, TXT, MD, CSV - max 10MB)
- Documents are automatically processed and indexed
-
Chat with Agents
- Go to Chat page
- Select the right agent for your task:
- Research - Finding and synthesizing information
- Analysis - Data analysis and insights
- Writer - Content creation and summarization
- Code - Technical documentation and code help
- Ask questions and get contextual answers from your documents
-
Monitor Performance
- Visit Analytics page to track:
- Document count and storage usage
- Query statistics and response times
- Agent performance metrics
- Visit Analytics page to track:
workspaceai/
βββ backend/
β βββ app/
β β βββ agents/ # AI agent implementations
β β βββ api/routes/ # API endpoints
β β βββ core/ # Config, logging, security
β β βββ mcp/ # MCP server
β β βββ models/ # Pydantic schemas
β β βββ services/ # Business logic
β β βββ utils/ # Utilities
β βββ tests/ # Unit tests
β βββ requirements.txt
β βββ Dockerfile
βββ frontend/
β βββ src/
β β βββ components/ # React components
β β βββ pages/ # Page components
β β βββ lib/ # API client, utilities
β β βββ types/ # TypeScript types
β β βββ App.tsx
β βββ package.json
β βββ Dockerfile
βββ docs/
β βββ screenshots/ # Application screenshots
βββ docker-compose.yml
βββ .env.example
βββ Makefile
βββ README.md
curl -X POST "http://localhost:8000/api/v1/documents/upload" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"curl -X POST "http://localhost:8000/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Summarize the key points from the uploaded documents",
"agent_type": "research"
}'curl -X POST "http://localhost:8000/api/v1/agents/analysis/invoke" \
-H "Content-Type: application/json" \
-d '{
"query": "Analyze the data patterns in the CSV files",
"context": {}
}'curl -X GET "http://localhost:8000/api/v1/analytics/metrics"cd backend
# Run all tests
pytest tests/ -v
# With coverage report
pytest tests/ -v --cov=app --cov-report=html
# Run specific test
pytest tests/test_agents.py -vcd frontend
npm test
npm run test:coveragemake test # Run all tests (backend + frontend)
make lint # Run linters
make format # Format code
make clean # Clean caches- JWT-based authentication (configurable)
- CORS protection
- Rate limiting with slowapi
- Input validation with Pydantic
- Environment-based configuration
- Secure file upload handling
- SQL injection prevention
# Build and deploy
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# Scale services
docker-compose up -d --scale backend=3
# View logs
docker-compose logs -f backendENVIRONMENT=production
LOG_LEVEL=INFO
DATABASE_URL=postgresql://user:pass@host:5432/db
REDIS_URL=redis://redis:6379/0
ENABLE_REDIS_CACHE=trueSee DEPLOYMENT_GUIDE.md for detailed deployment instructions.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain - For the amazing LLM orchestration framework
- Qdrant - For the high-performance vector database
- FastAPI - For the modern Python web framework
- OpenAI - For powerful language models and embeddings
Project Link: https://github.com/kaninstein/workspaceai
- Core RAG functionality with multiple agents
- Document processing pipeline
- Real-time chat interface
- Analytics dashboard
- MCP protocol integration
- User authentication and multi-tenancy
- Document versioning and history
- Advanced analytics dashboards
- Export conversations to PDF
- Custom agent creation UI
- Integration with more LLM providers
- Mobile app (React Native)
- Voice input/output support

