wildchat-mcp

Value Prop

Utilizing MCP as a bridge between LLMs and large structured datasets. Unlike static notebooks, MCP enables dynamic, semantic (natural-language) querying. This structure enables LLMs to invoke mcp tools to access & query data 3,600× larger than Claude's context window.

WildChat Dataset: https://huggingface.co/datasets/allenai/WildChat-1M https://arxiv.org/abs/2405.01470

1M conversations = 737M tokens
Claude's context limit = 200K tokens (chatGPT = ~128k)
Dataset is 3,685× larger than Claude's context window

Used static analysis (notebooks) for traditional data science: creating visualizations, computing distributions, answering pre-defined questions. Setting up the MCP server allows for dynamic interaction where an LLM explores the dataset conversationally through on-demand queries.

DuckDB:

One-time setup: Load parquet -> DuckDB database file
Notebook & MCP queries query the DuckDB file directly

Benefits:
- Zero-config embedded database
- Handles 1GB-100GB datasets efficiently
- SQL interface without server overhead Spark = designed for distributed clusters when data exceeds single-machine RAM. Wildchat does not meet this threshold.

Without MCP	With MCP
LLMs hallucinate about unseen data	Tools execute actual database queries
Static notebooks = fixed questions only	Dynamic, ad-hoc exploration
Context window limits what's queryable	Access datasets 1000× larger than context

MCP vs. RAG

Feature	RAG	MCP
Returns	Text passages	Structured data (numbers, counts)
Accuracy	LLM estimates from text	Precise SQL results
Use Case	"What does the doc say?"	"What's the average/count/trend?"

Example:

RAG: Retrieves text, estimates "~50-60 Fellows"
MCP: SELECT COUNT(*) WHERE year=2020 → Exactly 57

Setup

load dataset

source setup.sh

Running/connecting MCP

run MCP server: python mcp_server.py
update claude desktop config If using a venv:

  "mcpServers": {
    "wildchat-analytics": {
      "command": "/Users/yourname/path/to/project/.venv/bin/python",
      "args": ["/Users/yourname/path/to/project/mcp_server.py"],
      "env": {
        "WILDCHAT_DB_PATH": "/path/to/dot.db"
      }
    }
  }

Restart and reopen

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
dataset_setup.py		dataset_setup.py
mcp_server.py		mcp_server.py
queries.py		queries.py
requirements.txt		requirements.txt
setup.sh		setup.sh
wildchat_analysis.ipynb		wildchat_analysis.ipynb
wildchat_analysis_refactored.ipynb		wildchat_analysis_refactored.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wildchat-mcp

Value Prop

Setup

load dataset

Running/connecting MCP

About

Uh oh!

Releases

Packages

Languages

Kyle-Zhou/wildchat-mcp

Folders and files

Latest commit

History

Repository files navigation

wildchat-mcp

Value Prop

Setup

load dataset

Running/connecting MCP

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages