BotForge RAG - Technical Architecture Documentation

Overview

BotForge RAG is a sophisticated, intent-based AI system that seamlessly combines Retrieval-Augmented Generation (RAG) with Model Context Protocol (MCP) tool execution. The system intelligently routes user queries between information retrieval and dynamic tool execution based on detected intent.

🏗️ System Architecture

graph TD
    A[Client Request] --> B[Intent Detection Service]
    B --> C{Intent Type}
    C -->|information_retrieval| D[RAG Pipeline]
    C -->|execution| E[MCP Agent Pipeline]
    
    D --> F[Vector Search]
    F --> G[OpenAI LLM]
    G --> H[RAG Response]
    
    E --> I[LangChain Agent]
    I --> J[External MCP Servers]
    J --> K[Tool Execution]
    K --> L[Agent Response]
    
    M[MCP Server Registration] --> N[Per-Bot Tool Registry]
    N --> E

🧠 Core Components

1. Intent Detection System

Location: src/botforge/services/vector_query.py

Classifies user queries into two categories:

Information Retrieval: Questions seeking knowledge ("What is machine learning?")
Execution: Action-oriented requests ("Calculate 25 * 17", "Convert text to uppercase")

Implementation:

async def _detect_intent(self, query: str) -> str:
    execution_keywords = ["calculate", "compute", "convert", "transform", ...]
    information_keywords = ["what", "how", "why", "explain", ...]

2. RAG (Retrieval-Augmented Generation) Pipeline

Location: src/botforge/services/vector_query.py

Flow:

Vector Embedding: User query → 384-dimensional vector
Similarity Search: Find relevant document chunks
Context Assembly: Combine chunks with query
LLM Generation: OpenAI generates contextual response

Key Features:

Redis caching for embeddings
Configurable chunk retrieval limits
Relevance score thresholding
Source attribution

3. MCP (Model Context Protocol) Agent System

Location: src/botforge/services/mcp_agent_service.py

Architecture:

LangChain Agent: Zero-shot ReAct agent with tool selection
External Tool Registry: Per-bot MCP server registration
HTTP-based Tool Execution: Direct calls to external MCP servers
Dynamic Tool Loading: Runtime discovery of available tools

Tool Execution Flow:

User Query → LangChain Agent → Tool Selection → HTTP Call → External MCP Server → Result

4. External MCP Manager

Location: src/botforge/services/external_mcp_manager.py

Responsibilities:

MCP server registration per bot
Tool discovery and validation
Server health monitoring
Execution logging and metrics

📡 API Endpoints

Core Query Endpoints

`POST /vector/query-dynamic`

Intent-based unified query endpoint

{
  "user_id": "uuid",
  "bot_id": "uuid", 
  "client_id": "string",
  "query": "user question or command",
  "model": "gpt-3.5-turbo"
}

Response:

{
  "user_id": "uuid",
  "bot_id": "uuid",
  "query": "original query",
  "response": "system response",
  "intent": "information_retrieval|execution",
  "processing_type": "rag_information_retrieval|mcp_agent_execution",
  "execution_time": 2.34,
  "sources": [...],  // Only for RAG
  "mcp_tools_used": [...],  // Only for MCP
  "agent_reasoning": "..."  // Only for MCP
}

`POST /vector/query`

Direct RAG query endpoint

Forces RAG pipeline regardless of intent
Returns document sources and relevance scores

`POST /vector/mcp-query`

Direct MCP agent endpoint

Forces MCP agent execution regardless of intent
Uses bot's registered MCP tools

MCP Management Endpoints

`POST /api/mcp/register`

Register MCP server for a bot

{
  "bot_id": "uuid",
  "name": "Server Name",
  "endpoint_url": "http://localhost:3001",
  "description": "Server description",
  "timeout_seconds": 30,
  "retry_attempts": 3
}

`GET /api/mcp/servers/{bot_id}`

List all MCP servers for a bot

`GET /api/mcp/tools/{bot_id}`

List all available tools for a bot

`GET /api/mcp/tools`

List all tools across all bots

Document Management Endpoints

`POST /upload/documents`

Upload documents for RAG indexing

`GET /upload/namespaces`

List available document namespaces

System Endpoints

`GET /health`

System health check

`GET /vector/stats/{user_id}/{bot_id}`

Query statistics and metrics

🔧 Configuration

Environment Variables

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/botforge

# Redis
REDIS_URL=redis://localhost:6379

# OpenAI
OPENAI_API_KEY=sk-...

# Vector Model
VECTOR_MODEL_PATH=all-MiniLM-L6-v2

Core Settings

Location: src/botforge/core/config.py

class Settings:
    database_url: str
    redis_url: str
    openai_api_key: str
    vector_model_path: str = "all-MiniLM-L6-v2"
    max_chunks: int = 5
    similarity_threshold: float = 0.7

🗄️ Database Schema

Core Tables

`bots`

CREATE TABLE bots (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    name VARCHAR NOT NULL,
    description TEXT,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW()
);

`mcp_servers`

CREATE TABLE mcp_servers (
    id UUID PRIMARY KEY,
    bot_id UUID REFERENCES bots(id),
    name VARCHAR NOT NULL,
    endpoint_url VARCHAR NOT NULL,
    description TEXT,
    is_active BOOLEAN DEFAULT true,
    timeout_seconds INTEGER DEFAULT 30,
    retry_attempts INTEGER DEFAULT 3,
    config JSON,
    created_at TIMESTAMP DEFAULT NOW()
);

`mcp_tools`

CREATE TABLE mcp_tools (
    id UUID PRIMARY KEY,
    mcp_server_id UUID REFERENCES mcp_servers(id),
    name VARCHAR NOT NULL,
    description TEXT,
    schema JSON,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW()
);

`mcp_executions`

CREATE TABLE mcp_executions (
    id UUID PRIMARY KEY,
    bot_id UUID REFERENCES bots(id),
    mcp_server_id UUID REFERENCES mcp_servers(id),
    tool_name VARCHAR NOT NULL,
    input_parameters JSON,
    output_result JSON,
    execution_time_ms INTEGER,
    status VARCHAR,
    error_message TEXT,
    timestamp TIMESTAMP DEFAULT NOW()
);

🛠️ External MCP Server Protocol

Required Endpoints

`GET /capabilities`

{
  "server": {
    "name": "Server Name",
    "version": "1.0.0",
    "description": "Server description"
  },
  "tools": [
    {
      "name": "tool_name",
      "description": "Tool description",
      "schema": {
        "type": "object",
        "properties": {
          "param1": {"type": "string", "description": "Parameter description"}
        },
        "required": ["param1"]
      }
    }
  ],
  "protocol_version": "1.0"
}

`GET /tools`

{
  "tools": [...]  // Same format as capabilities.tools
}

`POST /execute`

// Request
{
  "tool_name": "calculator",
  "parameters": {
    "expression": "2 + 5"
  }
}

// Response
{
  "success": true,
  "result": {
    "calculation": "2 + 5",
    "answer": 7
  },
  "error": null
}

🔄 Request Flow Examples

Information Retrieval Flow

1. User: "What is machine learning?"
2. Intent Detection: "information_retrieval"
3. Vector Search: Find relevant ML documents
4. OpenAI: Generate response with context
5. Response: Educational content about ML

Tool Execution Flow

1. User: "Calculate 25 * 17 + 100"
2. Intent Detection: "execution"
3. LangChain Agent: Select calculator tool
4. HTTP Request: POST to external MCP server
5. Tool Execution: Calculate result
6. Agent Response: "525"

🚀 Deployment Architecture

Development Setup

# 1. Start database and Redis
docker-compose up -d postgres redis

# 2. Start BotForge API
PYTHONPATH=/opt/botforge-rag/src python -m botforge.main

# 3. Start external MCP server
python simple_mcp_server.py

# 4. Register MCP server for bot
curl -X POST http://localhost:8000/api/mcp/register \
  -H "Content-Type: application/json" \
  -d '{"bot_id": "...", "endpoint_url": "http://localhost:3001", ...}'

Production Considerations

Scalability

Horizontal Scaling: Multiple API instances behind load balancer
Database: PostgreSQL with connection pooling
Caching: Redis cluster for embeddings and responses
MCP Servers: Distributed across multiple hosts

Security

Authentication: JWT tokens for API access
MCP Server Validation: TLS/SSL for external server communication
Input Validation: Schema validation for all endpoints
Rate Limiting: Per-user/bot query limits

Monitoring

Metrics: Query latency, tool execution times, error rates
Logging: Structured logging with correlation IDs
Health Checks: Automated MCP server health monitoring
Alerting: Failed tool executions, database connectivity

🧪 Testing

Unit Tests

Intent detection accuracy
Vector similarity calculations
MCP tool registration/execution
Database operations

Integration Tests

End-to-end query flows
External MCP server communication
Database transactions
Redis caching behavior

Performance Tests

Query response times
Concurrent user handling
Vector search performance
Tool execution latency

📊 Performance Metrics

Key Performance Indicators

Query Response Time: < 2s for RAG, < 5s for MCP
Intent Detection Accuracy: > 95%
Tool Execution Success Rate: > 99%
Vector Search Relevance: > 0.7 similarity threshold

Optimization Strategies

Embedding Caching: Redis cache for vector embeddings
Connection Pooling: Database and HTTP client pools
Async Processing: Non-blocking I/O operations
Tool Result Caching: Cache frequent tool executions

🔮 Future Enhancements

Planned Features

Multi-modal Support: Image and audio processing
Advanced Agent Reasoning: Chain-of-thought execution
Tool Composition: Multi-step tool workflows
Real-time Streaming: WebSocket-based responses
Custom Intent Models: Machine learning-based intent detection

Technical Improvements

GraphQL API: More flexible query interface
Event-driven Architecture: Async event processing
Microservices: Separate RAG and MCP services
Observability: OpenTelemetry integration
Auto-scaling: Kubernetes-based deployment

Built with: FastAPI, LangChain, OpenAI, PostgreSQL, Redis, SentenceTransformers

License: [Your License]

Contributors: [Your Team]

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History