Skip to content

gszecsenyi/SchemaVault_MCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SchemaVault

MCP server for storing and retrieving database schema information for LLMs.

Features

  • Auto-load Databricks Unity Catalog schemas on startup
  • Vector-based semantic search with configurable embedding service
  • File-based storage (no external database required)
  • MCP interface via HTTP/SSE for LLM integration
  • LM Studio compatible

Quick Start

  1. Copy .env.example to .env and configure:
cp .env.example .env
  1. Configure your .env:
# Embedding API (default: local embedding service)
EMBEDDING_API_URL=http://localhost:8000/v1
EMBEDDING_API_KEY=your-secret-token
EMBEDDING_MODEL=nomic-embed-text

# Databricks (optional)
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-token
DATABRICKS_CATALOGS=main
  1. Build and run:
docker-compose up --build

Server runs on http://localhost:8001

MCP Tools

Tool Description
add_schema Store a table schema
query_model Semantic search for table info
list_models List all stored tables

Endpoints

  • GET /mcp/sse - SSE connection for MCP
  • POST /mcp/messages - MCP message handler
  • GET /health - Health check

LM Studio Integration

Add to ~/.lmstudio/mcp.json:

{
  "mcpServers": {
    "schemavault": {
      "url": "http://localhost:8001/mcp/sse"
    }
  }
}

Claude Desktop Integration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "schemavault": {
      "command": "docker",
      "args": ["exec", "-i", "schemavault-schemavault-1", "python", "-m", "src.server"]
    }
  }
}

How It Works

  1. On startup, cleans existing data and reloads schemas
  2. Loads all schemas from Databricks Unity Catalog (if configured)
  3. Embeds schemas using configured embedding service
  4. Stores embeddings in Hnswlib vector index
  5. LLM queries via MCP for semantic schema search

Environment Variables

Variable Default Description
EMBEDDING_API_URL http://localhost:8000/v1 Embedding service URL
EMBEDDING_API_KEY your-secret-token Embedding API key
EMBEDDING_MODEL nomic-embed-text Embedding model name
DATABRICKS_HOST - Databricks workspace URL
DATABRICKS_TOKEN - Databricks PAT
DATABRICKS_CATALOGS main Catalogs to load (main, a,b, or *)
DATABRICKS_SCHEMAS (all) Schemas to load (optional: schema1,schema2 or *)

Storage

Data stored in ./data/ (refreshed on each startup):

  • vectors.index - Hnswlib vector index (768 dimensions)
  • schemas.json - Table metadata

Requirements

  • Docker
  • Embedding service (OpenAI-compatible API)
  • (Optional) Databricks workspace with Unity Catalog access

About

MCP server for storing and retrieving Databricks Unity catalog schema information for LLMs with embedding integration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors