Skip to content

Latest commit

 

History

History
431 lines (348 loc) · 10.4 KB

File metadata and controls

431 lines (348 loc) · 10.4 KB

BotForge RAG - Technical Architecture Documentation

Overview

BotForge RAG is a sophisticated, intent-based AI system that seamlessly combines Retrieval-Augmented Generation (RAG) with Model Context Protocol (MCP) tool execution. The system intelligently routes user queries between information retrieval and dynamic tool execution based on detected intent.

🏗️ System Architecture

graph TD
    A[Client Request] --> B[Intent Detection Service]
    B --> C{Intent Type}
    C -->|information_retrieval| D[RAG Pipeline]
    C -->|execution| E[MCP Agent Pipeline]
    
    D --> F[Vector Search]
    F --> G[OpenAI LLM]
    G --> H[RAG Response]
    
    E --> I[LangChain Agent]
    I --> J[External MCP Servers]
    J --> K[Tool Execution]
    K --> L[Agent Response]
    
    M[MCP Server Registration] --> N[Per-Bot Tool Registry]
    N --> E
Loading

🧠 Core Components

1. Intent Detection System

Location: src/botforge/services/vector_query.py

Classifies user queries into two categories:

  • Information Retrieval: Questions seeking knowledge ("What is machine learning?")
  • Execution: Action-oriented requests ("Calculate 25 * 17", "Convert text to uppercase")

Implementation:

async def _detect_intent(self, query: str) -> str:
    execution_keywords = ["calculate", "compute", "convert", "transform", ...]
    information_keywords = ["what", "how", "why", "explain", ...]

2. RAG (Retrieval-Augmented Generation) Pipeline

Location: src/botforge/services/vector_query.py

Flow:

  1. Vector Embedding: User query → 384-dimensional vector
  2. Similarity Search: Find relevant document chunks
  3. Context Assembly: Combine chunks with query
  4. LLM Generation: OpenAI generates contextual response

Key Features:

  • Redis caching for embeddings
  • Configurable chunk retrieval limits
  • Relevance score thresholding
  • Source attribution

3. MCP (Model Context Protocol) Agent System

Location: src/botforge/services/mcp_agent_service.py

Architecture:

  • LangChain Agent: Zero-shot ReAct agent with tool selection
  • External Tool Registry: Per-bot MCP server registration
  • HTTP-based Tool Execution: Direct calls to external MCP servers
  • Dynamic Tool Loading: Runtime discovery of available tools

Tool Execution Flow:

User QueryLangChain AgentTool SelectionHTTP CallExternal MCP ServerResult

4. External MCP Manager

Location: src/botforge/services/external_mcp_manager.py

Responsibilities:

  • MCP server registration per bot
  • Tool discovery and validation
  • Server health monitoring
  • Execution logging and metrics

📡 API Endpoints

Core Query Endpoints

POST /vector/query-dynamic

Intent-based unified query endpoint

{
  "user_id": "uuid",
  "bot_id": "uuid", 
  "client_id": "string",
  "query": "user question or command",
  "model": "gpt-3.5-turbo"
}

Response:

{
  "user_id": "uuid",
  "bot_id": "uuid",
  "query": "original query",
  "response": "system response",
  "intent": "information_retrieval|execution",
  "processing_type": "rag_information_retrieval|mcp_agent_execution",
  "execution_time": 2.34,
  "sources": [...],  // Only for RAG
  "mcp_tools_used": [...],  // Only for MCP
  "agent_reasoning": "..."  // Only for MCP
}

POST /vector/query

Direct RAG query endpoint

  • Forces RAG pipeline regardless of intent
  • Returns document sources and relevance scores

POST /vector/mcp-query

Direct MCP agent endpoint

  • Forces MCP agent execution regardless of intent
  • Uses bot's registered MCP tools

MCP Management Endpoints

POST /api/mcp/register

Register MCP server for a bot

{
  "bot_id": "uuid",
  "name": "Server Name",
  "endpoint_url": "http://localhost:3001",
  "description": "Server description",
  "timeout_seconds": 30,
  "retry_attempts": 3
}

GET /api/mcp/servers/{bot_id}

List all MCP servers for a bot

GET /api/mcp/tools/{bot_id}

List all available tools for a bot

GET /api/mcp/tools

List all tools across all bots

Document Management Endpoints

POST /upload/documents

Upload documents for RAG indexing

GET /upload/namespaces

List available document namespaces

System Endpoints

GET /health

System health check

GET /vector/stats/{user_id}/{bot_id}

Query statistics and metrics

🔧 Configuration

Environment Variables

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/botforge

# Redis
REDIS_URL=redis://localhost:6379

# OpenAI
OPENAI_API_KEY=sk-...

# Vector Model
VECTOR_MODEL_PATH=all-MiniLM-L6-v2

Core Settings

Location: src/botforge/core/config.py

class Settings:
    database_url: str
    redis_url: str
    openai_api_key: str
    vector_model_path: str = "all-MiniLM-L6-v2"
    max_chunks: int = 5
    similarity_threshold: float = 0.7

🗄️ Database Schema

Core Tables

bots

CREATE TABLE bots (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    name VARCHAR NOT NULL,
    description TEXT,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW()
);

mcp_servers

CREATE TABLE mcp_servers (
    id UUID PRIMARY KEY,
    bot_id UUID REFERENCES bots(id),
    name VARCHAR NOT NULL,
    endpoint_url VARCHAR NOT NULL,
    description TEXT,
    is_active BOOLEAN DEFAULT true,
    timeout_seconds INTEGER DEFAULT 30,
    retry_attempts INTEGER DEFAULT 3,
    config JSON,
    created_at TIMESTAMP DEFAULT NOW()
);

mcp_tools

CREATE TABLE mcp_tools (
    id UUID PRIMARY KEY,
    mcp_server_id UUID REFERENCES mcp_servers(id),
    name VARCHAR NOT NULL,
    description TEXT,
    schema JSON,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW()
);

mcp_executions

CREATE TABLE mcp_executions (
    id UUID PRIMARY KEY,
    bot_id UUID REFERENCES bots(id),
    mcp_server_id UUID REFERENCES mcp_servers(id),
    tool_name VARCHAR NOT NULL,
    input_parameters JSON,
    output_result JSON,
    execution_time_ms INTEGER,
    status VARCHAR,
    error_message TEXT,
    timestamp TIMESTAMP DEFAULT NOW()
);

🛠️ External MCP Server Protocol

Required Endpoints

GET /capabilities

{
  "server": {
    "name": "Server Name",
    "version": "1.0.0",
    "description": "Server description"
  },
  "tools": [
    {
      "name": "tool_name",
      "description": "Tool description",
      "schema": {
        "type": "object",
        "properties": {
          "param1": {"type": "string", "description": "Parameter description"}
        },
        "required": ["param1"]
      }
    }
  ],
  "protocol_version": "1.0"
}

GET /tools

{
  "tools": [...]  // Same format as capabilities.tools
}

POST /execute

// Request
{
  "tool_name": "calculator",
  "parameters": {
    "expression": "2 + 5"
  }
}

// Response
{
  "success": true,
  "result": {
    "calculation": "2 + 5",
    "answer": 7
  },
  "error": null
}

🔄 Request Flow Examples

Information Retrieval Flow

1. User: "What is machine learning?"
2. Intent Detection: "information_retrieval"
3. Vector Search: Find relevant ML documents
4. OpenAI: Generate response with context
5. Response: Educational content about ML

Tool Execution Flow

1. User: "Calculate 25 * 17 + 100"
2. Intent Detection: "execution"
3. LangChain Agent: Select calculator tool
4. HTTP Request: POST to external MCP server
5. Tool Execution: Calculate result
6. Agent Response: "525"

🚀 Deployment Architecture

Development Setup

# 1. Start database and Redis
docker-compose up -d postgres redis

# 2. Start BotForge API
PYTHONPATH=/opt/botforge-rag/src python -m botforge.main

# 3. Start external MCP server
python simple_mcp_server.py

# 4. Register MCP server for bot
curl -X POST http://localhost:8000/api/mcp/register \
  -H "Content-Type: application/json" \
  -d '{"bot_id": "...", "endpoint_url": "http://localhost:3001", ...}'

Production Considerations

Scalability

  • Horizontal Scaling: Multiple API instances behind load balancer
  • Database: PostgreSQL with connection pooling
  • Caching: Redis cluster for embeddings and responses
  • MCP Servers: Distributed across multiple hosts

Security

  • Authentication: JWT tokens for API access
  • MCP Server Validation: TLS/SSL for external server communication
  • Input Validation: Schema validation for all endpoints
  • Rate Limiting: Per-user/bot query limits

Monitoring

  • Metrics: Query latency, tool execution times, error rates
  • Logging: Structured logging with correlation IDs
  • Health Checks: Automated MCP server health monitoring
  • Alerting: Failed tool executions, database connectivity

🧪 Testing

Unit Tests

  • Intent detection accuracy
  • Vector similarity calculations
  • MCP tool registration/execution
  • Database operations

Integration Tests

  • End-to-end query flows
  • External MCP server communication
  • Database transactions
  • Redis caching behavior

Performance Tests

  • Query response times
  • Concurrent user handling
  • Vector search performance
  • Tool execution latency

📊 Performance Metrics

Key Performance Indicators

  • Query Response Time: < 2s for RAG, < 5s for MCP
  • Intent Detection Accuracy: > 95%
  • Tool Execution Success Rate: > 99%
  • Vector Search Relevance: > 0.7 similarity threshold

Optimization Strategies

  • Embedding Caching: Redis cache for vector embeddings
  • Connection Pooling: Database and HTTP client pools
  • Async Processing: Non-blocking I/O operations
  • Tool Result Caching: Cache frequent tool executions

🔮 Future Enhancements

Planned Features

  1. Multi-modal Support: Image and audio processing
  2. Advanced Agent Reasoning: Chain-of-thought execution
  3. Tool Composition: Multi-step tool workflows
  4. Real-time Streaming: WebSocket-based responses
  5. Custom Intent Models: Machine learning-based intent detection

Technical Improvements

  1. GraphQL API: More flexible query interface
  2. Event-driven Architecture: Async event processing
  3. Microservices: Separate RAG and MCP services
  4. Observability: OpenTelemetry integration
  5. Auto-scaling: Kubernetes-based deployment

Built with: FastAPI, LangChain, OpenAI, PostgreSQL, Redis, SentenceTransformers

License: [Your License]

Contributors: [Your Team]