BotForge RAG is a sophisticated, intent-based AI system that seamlessly combines Retrieval-Augmented Generation (RAG) with Model Context Protocol (MCP) tool execution. The system intelligently routes user queries between information retrieval and dynamic tool execution based on detected intent.
graph TD
A[Client Request] --> B[Intent Detection Service]
B --> C{Intent Type}
C -->|information_retrieval| D[RAG Pipeline]
C -->|execution| E[MCP Agent Pipeline]
D --> F[Vector Search]
F --> G[OpenAI LLM]
G --> H[RAG Response]
E --> I[LangChain Agent]
I --> J[External MCP Servers]
J --> K[Tool Execution]
K --> L[Agent Response]
M[MCP Server Registration] --> N[Per-Bot Tool Registry]
N --> E
Location: src/botforge/services/vector_query.py
Classifies user queries into two categories:
- Information Retrieval: Questions seeking knowledge (
"What is machine learning?") - Execution: Action-oriented requests (
"Calculate 25 * 17","Convert text to uppercase")
Implementation:
async def _detect_intent(self, query: str) -> str:
execution_keywords = ["calculate", "compute", "convert", "transform", ...]
information_keywords = ["what", "how", "why", "explain", ...]Location: src/botforge/services/vector_query.py
Flow:
- Vector Embedding: User query → 384-dimensional vector
- Similarity Search: Find relevant document chunks
- Context Assembly: Combine chunks with query
- LLM Generation: OpenAI generates contextual response
Key Features:
- Redis caching for embeddings
- Configurable chunk retrieval limits
- Relevance score thresholding
- Source attribution
Location: src/botforge/services/mcp_agent_service.py
Architecture:
- LangChain Agent: Zero-shot ReAct agent with tool selection
- External Tool Registry: Per-bot MCP server registration
- HTTP-based Tool Execution: Direct calls to external MCP servers
- Dynamic Tool Loading: Runtime discovery of available tools
Tool Execution Flow:
User Query → LangChain Agent → Tool Selection → HTTP Call → External MCP Server → ResultLocation: src/botforge/services/external_mcp_manager.py
Responsibilities:
- MCP server registration per bot
- Tool discovery and validation
- Server health monitoring
- Execution logging and metrics
Intent-based unified query endpoint
{
"user_id": "uuid",
"bot_id": "uuid",
"client_id": "string",
"query": "user question or command",
"model": "gpt-3.5-turbo"
}Response:
{
"user_id": "uuid",
"bot_id": "uuid",
"query": "original query",
"response": "system response",
"intent": "information_retrieval|execution",
"processing_type": "rag_information_retrieval|mcp_agent_execution",
"execution_time": 2.34,
"sources": [...], // Only for RAG
"mcp_tools_used": [...], // Only for MCP
"agent_reasoning": "..." // Only for MCP
}Direct RAG query endpoint
- Forces RAG pipeline regardless of intent
- Returns document sources and relevance scores
Direct MCP agent endpoint
- Forces MCP agent execution regardless of intent
- Uses bot's registered MCP tools
Register MCP server for a bot
{
"bot_id": "uuid",
"name": "Server Name",
"endpoint_url": "http://localhost:3001",
"description": "Server description",
"timeout_seconds": 30,
"retry_attempts": 3
}List all MCP servers for a bot
List all available tools for a bot
List all tools across all bots
Upload documents for RAG indexing
List available document namespaces
System health check
Query statistics and metrics
# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/botforge
# Redis
REDIS_URL=redis://localhost:6379
# OpenAI
OPENAI_API_KEY=sk-...
# Vector Model
VECTOR_MODEL_PATH=all-MiniLM-L6-v2Location: src/botforge/core/config.py
class Settings:
database_url: str
redis_url: str
openai_api_key: str
vector_model_path: str = "all-MiniLM-L6-v2"
max_chunks: int = 5
similarity_threshold: float = 0.7CREATE TABLE bots (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
name VARCHAR NOT NULL,
description TEXT,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW()
);CREATE TABLE mcp_servers (
id UUID PRIMARY KEY,
bot_id UUID REFERENCES bots(id),
name VARCHAR NOT NULL,
endpoint_url VARCHAR NOT NULL,
description TEXT,
is_active BOOLEAN DEFAULT true,
timeout_seconds INTEGER DEFAULT 30,
retry_attempts INTEGER DEFAULT 3,
config JSON,
created_at TIMESTAMP DEFAULT NOW()
);CREATE TABLE mcp_tools (
id UUID PRIMARY KEY,
mcp_server_id UUID REFERENCES mcp_servers(id),
name VARCHAR NOT NULL,
description TEXT,
schema JSON,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW()
);CREATE TABLE mcp_executions (
id UUID PRIMARY KEY,
bot_id UUID REFERENCES bots(id),
mcp_server_id UUID REFERENCES mcp_servers(id),
tool_name VARCHAR NOT NULL,
input_parameters JSON,
output_result JSON,
execution_time_ms INTEGER,
status VARCHAR,
error_message TEXT,
timestamp TIMESTAMP DEFAULT NOW()
);{
"server": {
"name": "Server Name",
"version": "1.0.0",
"description": "Server description"
},
"tools": [
{
"name": "tool_name",
"description": "Tool description",
"schema": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "Parameter description"}
},
"required": ["param1"]
}
}
],
"protocol_version": "1.0"
}{
"tools": [...] // Same format as capabilities.tools
}// Request
{
"tool_name": "calculator",
"parameters": {
"expression": "2 + 5"
}
}
// Response
{
"success": true,
"result": {
"calculation": "2 + 5",
"answer": 7
},
"error": null
}1. User: "What is machine learning?"
2. Intent Detection: "information_retrieval"
3. Vector Search: Find relevant ML documents
4. OpenAI: Generate response with context
5. Response: Educational content about ML
1. User: "Calculate 25 * 17 + 100"
2. Intent Detection: "execution"
3. LangChain Agent: Select calculator tool
4. HTTP Request: POST to external MCP server
5. Tool Execution: Calculate result
6. Agent Response: "525"
# 1. Start database and Redis
docker-compose up -d postgres redis
# 2. Start BotForge API
PYTHONPATH=/opt/botforge-rag/src python -m botforge.main
# 3. Start external MCP server
python simple_mcp_server.py
# 4. Register MCP server for bot
curl -X POST http://localhost:8000/api/mcp/register \
-H "Content-Type: application/json" \
-d '{"bot_id": "...", "endpoint_url": "http://localhost:3001", ...}'- Horizontal Scaling: Multiple API instances behind load balancer
- Database: PostgreSQL with connection pooling
- Caching: Redis cluster for embeddings and responses
- MCP Servers: Distributed across multiple hosts
- Authentication: JWT tokens for API access
- MCP Server Validation: TLS/SSL for external server communication
- Input Validation: Schema validation for all endpoints
- Rate Limiting: Per-user/bot query limits
- Metrics: Query latency, tool execution times, error rates
- Logging: Structured logging with correlation IDs
- Health Checks: Automated MCP server health monitoring
- Alerting: Failed tool executions, database connectivity
- Intent detection accuracy
- Vector similarity calculations
- MCP tool registration/execution
- Database operations
- End-to-end query flows
- External MCP server communication
- Database transactions
- Redis caching behavior
- Query response times
- Concurrent user handling
- Vector search performance
- Tool execution latency
- Query Response Time: < 2s for RAG, < 5s for MCP
- Intent Detection Accuracy: > 95%
- Tool Execution Success Rate: > 99%
- Vector Search Relevance: > 0.7 similarity threshold
- Embedding Caching: Redis cache for vector embeddings
- Connection Pooling: Database and HTTP client pools
- Async Processing: Non-blocking I/O operations
- Tool Result Caching: Cache frequent tool executions
- Multi-modal Support: Image and audio processing
- Advanced Agent Reasoning: Chain-of-thought execution
- Tool Composition: Multi-step tool workflows
- Real-time Streaming: WebSocket-based responses
- Custom Intent Models: Machine learning-based intent detection
- GraphQL API: More flexible query interface
- Event-driven Architecture: Async event processing
- Microservices: Separate RAG and MCP services
- Observability: OpenTelemetry integration
- Auto-scaling: Kubernetes-based deployment
Built with: FastAPI, LangChain, OpenAI, PostgreSQL, Redis, SentenceTransformers
License: [Your License]
Contributors: [Your Team]