Skip to content

MCP Server Registry: Add, Configure & Connect External MCP Servers as Agent Tools #118

@Empreiteiro

Description

@Empreiteiro

Summary

Create a management screen where users can add, configure, and connect MCP (Model Context Protocol) servers to the platform. These servers become tools available to a new "Tool Agent" that can orchestrate calls to external systems (dbt, Snowflake, Airflow, etc.) during Q&A and ETL assistance workflows.

The platform already has an MCP server (mcp_server.py via FastMCP) that exposes Data Talks tools to external clients. This feature adds the inverse: Data Talks acting as an MCP client that consumes tools from external MCP servers.

Problem

  • The platform can only interact with data sources via hardcoded scripts (ask_csv.py, ask_sql.py, etc.)
  • No mechanism to dynamically add new tool capabilities without writing new backend code
  • Users cannot leverage their existing MCP-enabled tools (dbt MCP, Snowflake MCP, Airflow MCP) from within the platform
  • The LLM client (client.py) captures tool_calls in traces but never executes them — function calling is logged but not acted on
  • Agents (Agent model) have no concept of "tools" — only static source_ids and relationships

Proposed Solution

Part 1: MCP Server Registry (Backend + UI)

New Model: McpServer

class McpServer(Base):
    __tablename__ = "mcp_servers"

    id: str              # UUID
    user_id: str         # Owner
    organization_id: str # Multi-tenancy
    name: str            # Display name (e.g., "dbt Cloud")
    description: str     # What this server provides
    
    # Connection
    transport: str       # "sse" | "stdio" | "streamable_http"
    endpoint: str        # URL for SSE/HTTP, or command for stdio
    auth_type: str       # "none" | "api_key" | "bearer" | "oauth"
    auth_config: dict    # JSON: {"api_key": "...", "header": "Authorization"}
    
    # Registry metadata
    server_type: str     # "preset" | "custom"
    preset_id: str       # For presets: "dbt", "snowflake", "airflow", etc.
    icon_url: str        # Display icon
    
    # Tool discovery
    discovered_tools: dict  # JSON: cached tool list from server
    last_discovery: datetime
    
    # Status
    is_active: bool
    health_status: str   # "connected" | "error" | "unconfigured"
    last_health_check: datetime
    health_error: str
    
    created_at: datetime
    updated_at: datetime

Preset Server Catalog

Pre-configured templates for popular MCP servers. Users select from a catalog and only fill in credentials:

Preset Transport Endpoint Template Auth Tools Provided
dbt Cloud SSE dbt-mcp remote Bearer token generate_model_yaml, generate_source, generate_staging_model, get_project_details, list_jobs, column_lineage
dbt Local stdio uvx dbt-mcp local None (local) compile, build, docs, get_model_details
Snowflake SSE Snowflake managed MCP URL OAuth execute_sql, list_databases, list_schemas, list_tables, cortex_search
PostgreSQL stdio npx @modelcontextprotocol/server-postgres Connection string query, list_tables, describe_table
DuckDB/MotherDuck stdio uvx mcp-server-motherduck Token (optional) query, list_tables, list_databases
Airflow SSE Custom URL API key list_dags, get_dag_runs, get_task_instances, trigger_dag
Airbyte SSE https://airbyte.mcp.kapa.ai None search_docs, list_connectors, check_status
Confluence SSE Atlassian Remote MCP OAuth search_pages, get_page, create_page, update_page
Notion SSE https://mcp.notion.so/mcp OAuth search, read_page, create_page, update_page
Great Expectations stdio uvx gx-mcp-server None validate_data, create_suite, list_expectations
Fivetran SSE Custom (community) API key + secret list_connectors, get_connector_status, list_users
GitHub stdio npx @modelcontextprotocol/server-github PAT search_repos, create_issue, list_prs

Custom Server Support

For servers not in the preset catalog:

  1. User provides: name, transport type, endpoint/command, auth config
  2. Platform connects and discovers available tools via MCP tools/list
  3. User reviews discovered tools and enables/disables individually
  4. Health check validates connectivity

Part 2: MCP Client Engine (Backend)

New Service: mcp_client_service.py

class McpClientManager:
    """Manages connections to external MCP servers and tool execution."""
    
    async def connect(server: McpServer) -> McpClientSession
    async def discover_tools(server: McpServer) -> list[Tool]
    async def call_tool(server: McpServer, tool_name: str, args: dict) -> ToolResult
    async def health_check(server: McpServer) -> HealthStatus
    async def disconnect(server: McpServer)
  • Uses mcp Python SDK (already a dependency — mcp[cli]>=1.8.0 in pyproject.toml)
  • Connection pooling: keep SSE connections alive, reuse stdio processes
  • Timeout and retry logic per server
  • Tool result sanitization (prevent injection from untrusted servers)

Part 3: Tool Agent (New Agent Type)

Concept

A new agent type that can use MCP tools during Q&A. Unlike current agents that dispatch to hardcoded scripts, the Tool Agent:

  1. Receives the user's question + context (source schemas, profiling)
  2. Has access to tools from configured MCP servers
  3. Uses LLM function calling to decide which tools to invoke
  4. Executes tool calls via McpClientManager
  5. Feeds results back to the LLM for the final answer

Integration with Existing Agent Model

Extend the Agent model:

class Agent(Base):
    # ... existing fields ...
    mcp_server_ids: list   # JSON: MCP servers available to this agent
    tool_policy: str       # "auto" | "confirm" | "readonly"
    max_tool_calls: int    # Safety limit per question (default: 10)

Tool Policy Modes

  • auto: Agent can call tools freely (for trusted servers like local dbt)
  • confirm: Agent proposes tool calls, user approves before execution
  • readonly: Agent can only call read-only tools (list, get, search — no create/update/delete)

LLM Function Calling Integration

Extend client.py to:

  1. Accept tools parameter in chat_completion()
  2. Convert MCP tool definitions → OpenAI function calling format
  3. Handle tool_call responses by executing via McpClientManager
  4. Support multi-turn tool use (call → result → call → result → final answer)

Part 4: Frontend — MCP Server Management UI

New Tab in Account/Settings: "MCP Servers"

Server List View:

  • Grid/list of configured MCP servers with status indicators (green/yellow/red)
  • Quick actions: enable/disable, test connection, refresh tools
  • "Add Server" button

Add Server Flow (Modal/Wizard):

Step 1 — Choose Type:

┌─────────────────────────────────────────────┐
│  Add MCP Server                             │
│                                             │
│  ○ From Catalog (recommended)               │
│    Select a pre-configured server           │
│                                             │
│  ○ Custom Server                            │
│    Configure a custom MCP endpoint          │
│                                             │
│                            [Next →]         │
└─────────────────────────────────────────────┘

Step 2a — Catalog Selection:

┌─────────────────────────────────────────────┐
│  Select MCP Server                          │
│                                             │
│  🔧 Data Transformation                     │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐      │
│  │  dbt    │ │Snowflake│ │ DuckDB  │      │
│  │ Cloud   │ │   MCP   │ │ Mother  │      │
│  └─────────┘ └─────────┘ └─────────┘      │
│                                             │
│  📊 Data Quality                             │
│  ┌─────────┐ ┌─────────┐                   │
│  │  Great  │ │  Soda   │                   │
│  │ Expect. │ │  Core   │                   │
│  └─────────┘ └─────────┘                   │
│                                             │
│  🔄 Orchestration                            │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐      │
│  │Airflow  │ │Fivetran │ │ Airbyte │      │
│  └─────────┘ └─────────┘ └─────────┘      │
│                                             │
│  📝 Documentation                            │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐      │
│  │Confluenc│ │ Notion  │ │ GitHub  │      │
│  └─────────┘ └─────────┘ └─────────┘      │
│                                             │
│  🗄️ Databases                                │
│  ┌─────────┐ ┌─────────┐                   │
│  │PostgreSQ│ │BigQuery │                   │
│  └─────────┘ └─────────┘                   │
│                            [Next →]         │
└─────────────────────────────────────────────┘

Step 2b — Custom Server:

┌─────────────────────────────────────────────┐
│  Custom MCP Server                          │
│                                             │
│  Name: [________________________]           │
│                                             │
│  Transport:  ○ SSE  ○ stdio  ○ HTTP        │
│                                             │
│  Endpoint/Command:                          │
│  [______________________________________]   │
│                                             │
│  Authentication:                            │
│  [None ▾]                                   │
│                                             │
│              [Test Connection] [Save]       │
└─────────────────────────────────────────────┘

Step 3 — Credentials (preset-specific):

┌─────────────────────────────────────────────┐
│  Configure dbt Cloud MCP                    │
│                                             │
│  dbt Cloud Token:                           │
│  [______________________________________]   │
│  (Personal or Service token from dbt Cloud) │
│                                             │
│  Transport:  ○ Remote (recommended)         │
│              ○ Local (requires dbt CLI)     │
│                                             │
│         [Test Connection] [Save]            │
└─────────────────────────────────────────────┘

Step 4 — Tool Discovery:

┌─────────────────────────────────────────────┐
│  dbt Cloud MCP — 12 tools discovered        │
│                                             │
│  ☑ generate_staging_model                   │
│  ☑ generate_model_yaml                      │
│  ☑ generate_source                          │
│  ☑ get_project_details                      │
│  ☑ list_jobs                                │
│  ☑ column_lineage                           │
│  ☐ build (write operation)                  │
│  ☐ compile                                  │
│  ☑ search_product_docs                      │
│  ...                                        │
│                                             │
│  Tool Policy: [Read-only ▾]                 │
│                                             │
│                    [Save & Activate]        │
└─────────────────────────────────────────────┘

Server Detail View:

  • Connection status and last health check
  • List of discovered tools with descriptions
  • Usage stats (calls per tool, last used)
  • Logs of recent tool executions
  • Edit credentials / reconfigure

Agent Configuration Extension

In the Agent edit screen, add a "Tools" section:

┌─────────────────────────────────────────────┐
│  Agent: Sales Analytics                     │
│                                             │
│  Sources: [PostgreSQL Prod] [Stripe API]    │
│                                             │
│  MCP Tools:                                 │
│  ☑ dbt Cloud (12 tools) [readonly]         │
│  ☑ Snowflake MCP (8 tools) [readonly]      │
│  ☐ Airflow (6 tools) [disabled]            │
│                                             │
│  Tool Policy: [Read-only ▾]                 │
│  Max tool calls per question: [10]          │
└─────────────────────────────────────────────┘

Technical Notes

Backend

  • MCP SDK: Already installed (mcp[cli]>=1.8.0 in pyproject.toml) — use mcp.client module
  • New files:
    • backend/app/models.py — Add McpServer model
    • backend/app/services/mcp_client_service.py — Connection manager + tool executor
    • backend/app/routers/mcp_router.py — CRUD + discovery + health check endpoints
    • backend/app/mcp_presets.py — Preset server catalog (JSON)
  • Alembic migration: New mcp_servers table
  • Extend Agent model: Add mcp_server_ids, tool_policy, max_tool_calls
  • Extend client.py: Add tools param to chat_completion(), handle tool_call loop
  • Security: Sanitize tool results before feeding back to LLM (prevent prompt injection from external servers)

Frontend

  • New files:
    • src/pages/McpServers.tsx — Server list + management
    • src/components/AddMcpServerModal.tsx — Wizard flow (catalog/custom)
    • src/components/McpServerCard.tsx — Server card with status
    • src/components/McpToolsList.tsx — Tool discovery and toggle
  • New tab in Account page or dedicated route /settings/mcp
  • Agent edit extension: Add MCP server selector to agent config form

Security Considerations

  • Tool result sanitization: External MCP servers return untrusted data — strip prompt injection attempts before feeding to LLM
  • Credential storage: MCP server auth tokens stored in auth_config JSON (same pattern as Source.metadata_)
  • Tool policy enforcement: Backend enforces readonly/confirm/auto policy regardless of LLM requests
  • Rate limiting: Max tool calls per question prevents runaway loops
  • Audit logging: All MCP tool calls logged to PlatformLog with server_id, tool_name, args, result

API Endpoints

# MCP Server CRUD
GET    /api/mcp-servers                      — List user's configured servers
POST   /api/mcp-servers                      — Add new server (preset or custom)
GET    /api/mcp-servers/{id}                 — Get server details + tools
PUT    /api/mcp-servers/{id}                 — Update server config
DELETE /api/mcp-servers/{id}                 — Remove server

# MCP Server Operations
POST   /api/mcp-servers/{id}/test            — Test connection
POST   /api/mcp-servers/{id}/discover        — Discover/refresh tools
GET    /api/mcp-servers/{id}/health          — Health check
POST   /api/mcp-servers/{id}/tools/{name}    — Execute a tool (for testing)

# Preset Catalog
GET    /api/mcp-servers/presets              — List available preset servers
GET    /api/mcp-servers/presets/{preset_id}  — Get preset config template

# Agent MCP Integration
PUT    /api/agents/{id}/mcp-servers          — Assign MCP servers to agent
GET    /api/agents/{id}/available-tools      — List all tools available to agent

Acceptance Criteria

MCP Server Registry

  • Users can add MCP servers from a preset catalog (at least 10 presets)
  • Users can add custom MCP servers with any transport (SSE, stdio, HTTP)
  • Platform discovers available tools via MCP tools/list after connection
  • Users can enable/disable individual tools per server
  • Health check runs on connection and periodically
  • Credentials stored securely (same pattern as existing Source.metadata_)

MCP Client Engine

  • Connect to SSE, stdio, and streamable HTTP MCP transports
  • Execute tool calls and return results
  • Connection pooling for persistent connections
  • Timeout and retry for failed tool calls
  • Tool result sanitization against prompt injection

Tool Agent

  • Agents can be assigned MCP servers as tool providers
  • LLM function calling enabled in chat_completion() when tools are configured
  • Multi-turn tool use loop (question → tool call → result → answer)
  • Tool policy enforcement (auto/confirm/readonly)
  • Max tool calls safety limit per question
  • All tool executions logged for audit

Frontend

  • MCP Servers management page with status indicators
  • Add server wizard with catalog selection and custom config
  • Tool discovery view with enable/disable toggles
  • Agent config extended with MCP server assignment
  • Preset-specific credential forms (dbt token, Snowflake OAuth, etc.)

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions