Skip to content

Latest commit

 

History

History
110 lines (101 loc) · 7.36 KB

File metadata and controls

110 lines (101 loc) · 7.36 KB

jcodemunch-mcp — Project Brief

Current State

  • Version: 1.20.0 (published to PyPI)
  • INDEX_VERSION: 7
  • Tests: 1624 passed, 7 skipped
  • Python: >=3.10

Key Files

src/jcodemunch_mcp/
  server.py            # MCP dispatcher (async); CLI subcommand dispatch, auth/rate-limit middleware
  security.py          # Path validation, skip patterns, file caps
  config.py            # JSONC config: global + per-project layering, env var fallback, language/tool gating
  parser/
    languages.py       # LANGUAGE_REGISTRY, extension → language map, LanguageSpec
    extractor.py       # parse_file() dispatch; custom parsers for Erlang, Fortran, SQL, Razor
    imports.py         # Regex import extraction (19 languages); extract_imports(), resolve_specifier()
  storage/
    sqlite_store.py    # CodeIndex, save/load/incremental_save, WAL-aware LRU cache (_db_mtime_ns)
  summarizer/
    batch_summarize.py # 3-tier: Anthropic > Gemini > OpenAI-compat > signature fallback
  tools/
    index_folder.py    # Local indexer (sync → asyncio.to_thread in server.py)
    index_repo.py      # GitHub indexer (async, httpx)
    get_symbol.py      # get_symbol_source: shape-follows-input (id→flat, ids[]→{symbols,errors})
    search_columns.py  # Column search across dbt/SQLMesh models
    get_context_bundle.py   # Symbol + imports bundle; token_budget/budget_strategy
    get_ranked_context.py   # Query-driven budgeted context (BM25 + PageRank)
    resolve_repo.py    # O(1) path→repo-ID lookup
    find_importers.py  # Files that import a given file (import graph); cross_repo param
    find_references.py # Files that reference a given identifier
    test_summarizer.py # Diagnostic tool: probe AI summarizer, report status (disabled by default)
    package_registry.py # Cross-repo package registry: manifest parsing, registry building, specifier resolution
    get_cross_repo_map.py # Cross-repo dependency map at the package level
    _call_graph.py       # Shared AST-derived call-graph helpers (callers/callees, BFS)
    get_call_hierarchy.py # get_call_hierarchy: callers+callees for a symbol, N levels deep
    get_impact_preview.py # get_impact_preview: transitive "what breaks?" analysis
    get_symbol_complexity.py  # get_symbol_complexity: cyclomatic/nesting/param_count for a symbol
    get_churn_rate.py         # get_churn_rate: git commit count for file or symbol over N days
    get_hotspots.py           # get_hotspots: top-N high-risk symbols by complexity x churn
    get_repo_health.py        # get_repo_health: one-call triage snapshot (delegate aggregator)

CLI Subcommands

Subcommand Purpose
serve (default) Run the MCP server (stdio, sse, or streamable-http)
watch <paths> File watcher — auto-reindex on change
watch-claude Auto-discover and watch Claude Code worktrees
hook-event create|remove Record a worktree lifecycle event (called by Claude Code hooks)
index-file <path> Re-index a single file within an existing indexed folder (used by PostToolUse hooks)
config Print effective configuration grouped by concern
config --check Also validate prerequisites (storage writable, AI pkg installed, HTTP pkgs present)
config --upgrade Add missing keys from current template to existing config.jsonc, preserving user values

Architecture Notes

  • index_folder is synchronous — dispatched via asyncio.to_thread() in server.py to avoid blocking the event loop
  • index_repo is async (uses httpx for GitHub API)
  • has_index() distinguishes "no file on disk" from "file exists but version rejected"
  • Symbol lookup is O(1) via __post_init__ id dict in CodeIndex

Custom Parsers

Tree-sitter grammar lacks clean named fields for these — custom regex extractors:

  • Erlang: multi-clause function merging by (name, arity); arity-qualified names (e.g. add/2)
  • Fortran: module-as-container, qualified names (math_utils::multiply), parameter constants
  • SQL: _parse_sql_symbols + sql_preprocessor.py strips Jinja (dbt); macro/test/snapshot/materialization as symbols
  • Razor/Blazor (.cshtml/.razor): @functions/@code → C#, @page/@inject → constants, HTML ids

Env Vars

Var Default Purpose
CODE_INDEX_PATH ~/.code-index/ Index storage location
JCODEMUNCH_MAX_INDEX_FILES 10,000 File cap for repo indexing
JCODEMUNCH_MAX_FOLDER_FILES 2,000 File cap for folder indexing
JCODEMUNCH_FILE_TREE_MAX_FILES 500 Cap for get_file_tree results
JCODEMUNCH_GITIGNORE_WARN_THRESHOLD 500 Missing-.gitignore warning threshold (0 = disable)
JCODEMUNCH_USE_AI_SUMMARIES auto AI summarization mode: auto (detect provider), true (use explicit config), false/0/no/off (disable)
JCODEMUNCH_SUMMARIZER_PROVIDER Explicit summarizer provider: anthropic, gemini, openai, minimax, glm, openrouter, none
JCODEMUNCH_SUMMARIZER_MODEL Model name override for the selected summarizer provider
JCODEMUNCH_TRUSTED_FOLDERS Roots trusted for index_folder; whitelist mode by default
JCODEMUNCH_EXTRA_IGNORE_PATTERNS Always-on gitignore patterns (comma-sep or JSON array)
JCODEMUNCH_PATH_MAP Cross-platform path remapping; format: orig1=new1,orig2=new2
JCODEMUNCH_STALENESS_DAYS 7 Days before get_repo_outline emits a staleness_warning
JCODEMUNCH_MAX_RESULTS 500 Hard cap on search_columns result count
JCODEMUNCH_HTTP_TOKEN Bearer token for HTTP transport auth (opt-in)
JCODEMUNCH_RATE_LIMIT 0 Max requests/minute per client IP in HTTP transport (0 = disabled)
JCODEMUNCH_REDACT_SOURCE_ROOT 0 Set 1 to replace source_root with display_name in responses
JCODEMUNCH_SHARE_SAVINGS 1 Set 0 to disable anonymous token savings telemetry
JCODEMUNCH_STATS_FILE_INTERVAL 3 Calls between session_stats.json writes; 0 = disable
ANTHROPIC_API_KEY Enables Claude Haiku summaries (pip install jcodemunch-mcp[anthropic])
GOOGLE_API_KEY Enables Gemini Flash summaries (pip install jcodemunch-mcp[gemini])
OPENAI_API_BASE Local LLM endpoint (Ollama, LM Studio)
OPENAI_WIRE_API Set responses to use OpenAI Responses API instead of chat/completions
OPENROUTER_API_KEY Enables OpenRouter summaries (default model: meta-llama/llama-3.3-70b-instruct:free)
GEMINI_EMBED_TASK_AWARE 1 Set 0/false/no/off to disable task-type hints (RETRIEVAL_DOCUMENT / CODE_RETRIEVAL_QUERY) when using Gemini embeddings
JCODEMUNCH_CROSS_REPO_DEFAULT 0 Set 1 to enable cross-repo traversal by default in find_importers, get_blast_radius, get_dependency_graph

PR / Issue History

See git log and CHANGELOG.md. Active contributors: MariusAdrian88, DrHayt, tmeckel, drax1222.

Maintenance Practices

  1. Document every tool before shipping. Any PR adding a new tool to server.py must simultaneously update: README.md (tool reference), CLAUDE.md (Key Files), CHANGELOG.md, and at least one test.
  2. Log every silent exception. Every except Exception: block must emit at minimum logger.debug("...", exc_info=True). For user-facing fallbacks (AI summarizer, index load), use logger.warning(...).
  3. CHANGELOG.md is the authoritative version history — update it with every release.