This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is vowser-mcp-server - a FastAPI-based WebSocket server that serves as an MCP (Model Context Protocol) server for web crawling and path analysis. The server integrates with Neo4j for graph-based storage of web navigation paths and uses LangChain for AI-powered content analysis.
# Install dependencies (use uv - this project uses uv.lock)
uv sync
# Alternative: pip install
pip install -r requirements.txt# Start the FastAPI server
uvicorn app.main:app --port 8000 --reload
# Alternative: using Python module
python -m uvicorn app.main:app --port 8000 --reload# Run comprehensive WebSocket tests
cd test/
python test_single.py
# Expected output: "전체 결과: 5/5 성공"
# Run basic WebSocket connection test
python test_websocket.py
# Interactive testing via Jupyter
jupyter notebook test.ipynb
# Browser-based WebSocket testing
# Open test/websocket_test.html in browser# The server automatically connects to Neo4j on startup
# Check connection status in server logs: "Neo4j Service: Database connection successful."FastAPI WebSocket Server (app/main.py):
- Single WebSocket endpoint at
/wsfor real-time communication - Handles multiple message types:
save_path,check_graph,visualize_paths,find_popular_paths,search_path,create_indexes,cleanup_paths - JSON-based message protocol with structured request/response format
Neo4j Graph Database (app/services/neo4j_service.py):
- Stores web navigation paths as graph structures
- Node types: ROOT (domains), PAGE (clickable elements), PATH (complete navigation paths), PAGE_ANALYSIS, SECTION, ELEMENT
- Relationship types: HAS_PAGE, NAVIGATES_TO, NAVIGATES_TO_CROSS_DOMAIN, CONTAINS (PATH→PAGE)
- Weighted relationships track usage frequency and popularity
- Vector embeddings enable semantic search for natural language queries
AI Services (app/services/):
embedding_service.py: Generates semantic embeddings for UI elementsai_module.py: LangChain integration for content analysis- Uses Google Gemini and OpenAI models for semantic understanding
Data Models (app/models/path.py):
- Pydantic models for path data validation
- Structured representation of user navigation sequences with semantic metadata
- Client sends WebSocket message to
/ws - Server parses message type and routes to appropriate service
- Neo4j Service processes data (save paths, query graph structure)
- AI Services enhance data with embeddings and semantic analysis
- Server returns structured JSON response to client
- ROOT: Domain-level nodes (e.g., youtube.com)
- PAGE: Interactive elements with selectors, text labels, and embeddings
- PATH: Complete navigation sequences with embeddings for semantic search
- Relationships:
- HAS_PAGE: ROOT→PAGE connections
- NAVIGATES_TO: PAGE→PAGE navigation flow
- NAVIGATES_TO_CROSS_DOMAIN: Cross-domain transitions
- CONTAINS: PATH→PAGE membership
- Time tracking: All relationships include createdAt and lastUpdated timestamps
- Weight system: Usage frequency tracking for popularity analysis
Required .env file:
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
GOOGLE_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_key
The server expects JSON messages with this structure:
{
"type": "message_type",
"data": { /* message-specific data */ }
}Supported message types:
save_path: Store user navigation sequences (automatically creates PATH entities)check_graph: Get graph statisticsvisualize_paths: Query domain-specific pathsfind_popular_paths: Get weighted popular navigation routessearch_path: Natural language path search with vector embeddingscreate_indexes: Create vector and text search indexes in Neo4jcleanup_paths: Clean up old unused paths (30+ days)
Detailed API documentation: docs/WEBSOCKET_API.md
Always use Context7 to reference up-to-date library documentation when working with external dependencies. This ensures:
- Version-compatible code that works with the specific library versions in this project
- Best practices aligned with current documentation
- Proper API usage patterns
Example workflow:
- Before modifying FastAPI, LangChain, Neo4j, or Pydantic code, query Context7 for the latest documentation
- Verify compatibility with versions specified in
pyproject.tomlandrequirements.txt - Use recommended patterns from official documentation rather than outdated examples
Key libraries to always check:
- FastAPI (WebSocket implementation)
- LangChain (AI service integration)
- Neo4j Python driver (graph operations)
- Pydantic (data validation models)
The learn/ folder contains comprehensive Knowledge Graph and RAG implementation examples that demonstrate advanced patterns relevant to this project:
-
KG_P2_01_news_analysis.ipynb: News data analysis using Neo4j + LangChain- Shows how to extract entities from text using LLM
- Demonstrates graph construction with semantic relationships
- Implements hybrid RAG (vector search + graph traversal)
- Text2Cypher for natural language querying
-
KG_P2_02_etf_recommendation.ipynb: ETF recommendation system- Entity extraction from structured financial data
- Complex graph ontology with multiple node types
- Full-text search indexing with CJK analyzer for Korean
- Few-shot prompting for Text2Cypher
- Semantic similarity-based example selection
-
KG_P2_03_10K_report.ipynb: Corporate document analysis -
KG_P2_04_law_qa.ipynb: Legal Q&A system
When implementing new features or debugging existing functionality, refer to these examples for:
- Entity Extraction: See news analysis notebook for LLM-based entity extraction patterns
- Graph Schema Design: ETF notebook shows comprehensive ontology design
- Vector Indexing: Both examples demonstrate proper vector embedding setup
- Text2Cypher: Advanced prompting techniques with few-shot examples
- Hybrid RAG: Combining vector similarity with graph traversal
- Error Handling: Robust error handling in graph operations
- Performance Optimization: Batch processing and indexing strategies
- Study these patterns before implementing similar functionality
- Adapt the ontology and relationship patterns to web navigation domain
- Use the Text2Cypher prompting strategies for natural language queries
- Reference the vector indexing setup for embedding-based search
test_single.py: Comprehensive individual message type testingtest/fixtures/test_data.py: Centralized test data with multiple YouTube navigation scenarioswebsocket_test.html: Browser-based visual testing interface- Always test with
python test_single.pyafter changes - expect "5/5 성공"