This guide explains how to set up and use the RAG system with MCP (Model Context Protocol) support.
Model Context Protocol (MCP) is a standardized way for AI applications to connect to external data sources and tools. Our RAG system exposes its functionality through MCP, allowing AI assistants like Claude Desktop to directly query the "Attention Is All You Need" paper.
# Method 1: Using the startup script
python start_mcp_server.py
# Method 2: Direct execution
cd src && python mcp_server.pyFor Claude Desktop, add this to your MCP configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"rag-attention-paper": {
"command": "python",
"args": [
"/Users/sam/Desktop/rag/src/mcp_server.py"
],
"env": {
"PYTHONPATH": "/Users/sam/Desktop/rag/src",
"OPENAI_API_KEY": "your_openai_api_key_here"
}
}
}
}Important: Replace /Users/sam/Desktop/rag with your actual project path and add your OpenAI API key.
After updating the configuration, restart Claude Desktop to load the MCP server.
The MCP server exposes 4 tools:
Ask questions about the Attention paper using RAG.
Parameters:
question(required): Your question about the papernum_chunks(optional): Number of chunks to retrieve (1-20, default: 5)min_score(optional): Minimum similarity score (0.0-1.0, default: 0.1)
Example:
What is the Transformer architecture and how does it work?
Search for specific content without AI generation.
Parameters:
query(required): Search termslimit(optional): Max results (1-20, default: 5)min_score(optional): Minimum similarity score (0.0-1.0, default: 0.1)
Example:
Search for "multi-head attention"
Retrieve a specific chunk by ID.
Parameters:
chunk_id(required): Chunk identifier (e.g., "chunk_0001")
Example:
Get chunk_0005
Get system statistics and health information.
Parameters: None
The MCP server provides 2 resources:
Information about the paper including:
- Title, authors, year
- Abstract and key contributions
- Number of available chunks
Current RAG system statistics:
- Initialization status
- OpenAI availability
- Vector store information
- Chunk counts and dimensions
Once configured, you can use these tools in Claude Desktop:
"Can you query the attention paper about how the Transformer architecture works?"
"Search the attention paper for information about positional encoding"
"What does the paper say about multi-head attention? Please use the RAG system to find specific details."
"Can you check the status of the RAG system and tell me how many chunks are available?"
Set these in your MCP configuration:
{
"env": {
"OPENAI_API_KEY": "your_key_here",
"PYTHONPATH": "/path/to/rag/src",
"LOG_LEVEL": "INFO"
}
}The MCP server automatically:
- Initializes the RAG pipeline
- Loads the Attention paper
- Uses the mock vector store (for reliability)
- Connects to OpenAI (if API key provided)
-
Server won't start
# Check Python path and dependencies cd /Users/sam/Desktop/rag/src ../rag_env/bin/python mcp_server.py
-
Tools not appearing in Claude
- Verify the configuration file path
- Check that the server path is correct
- Restart Claude Desktop completely
-
OpenAI errors
- Ensure OPENAI_API_KEY is set correctly
- Check API quota and billing
- Server will work without OpenAI (limited functionality)
-
Import errors
- Verify PYTHONPATH in configuration
- Ensure all dependencies are installed
- Check virtual environment activation
Run the server with debug output:
cd src
PYTHONPATH=. python mcp_server.py 2>&1 | tee mcp_debug.logUse the test script to verify functionality:
python test_mcp.pyThe MCP server provides:
- Structured logging to stderr
- Tool usage statistics
- Error handling and reporting
- Resource access tracking
- The server runs locally and doesn't expose network ports
- Communication is through stdio (standard input/output)
- OpenAI API key is handled securely through environment variables
- No data is stored permanently by the MCP server
You can ask complex questions that combine multiple aspects:
"Compare the attention mechanism in Transformers with traditional RNN approaches, using specific details from the paper"
- Explore sections: Use
search_paper_chunksto find relevant sections - Deep dive: Use
get_paper_chunkto examine specific content - Synthesize: Use
query_attention_paperfor comprehensive answers - Verify: Use
get_rag_statsto check system status
The MCP server can be used alongside other MCP servers in Claude Desktop, allowing you to:
- Query the Attention paper
- Access other research papers (if you have other MCP servers)
- Combine information from multiple sources
- Startup time: ~5-10 seconds (includes document loading)
- Query time: ~1-3 seconds (depending on OpenAI API)
- Memory usage: ~200MB (includes document and vectors)
- Concurrent requests: Handled sequentially for stability
To extend the MCP server:
- Add new tools in
mcp_server.py - Update the tool list in
handle_list_tools() - Implement handlers in
handle_call_tool() - Test with
test_mcp.py - Update this documentation
- MCP Version: 2024-11-05
- Transport: stdio
- Capabilities: tools, resources
- Server Name: rag-attention-paper
- Server Version: 1.0.0