Skip to content

Conversation

@bhauman
Copy link
Owner

@bhauman bhauman commented Jan 22, 2026

Summary

  • New deps_grep tool - Search for patterns inside dependency jar files on the classpath
  • New deps_read tool - Read files from inside dependency jars (using jar-path:entry-path format)
  • Lazy Java source downloading - Automatically fetches source jars from Maven Central when searching Java files (--type java)
  • Shared binary-available? utility - Consolidated binary checking across grep, deps_grep, deps_sources

Features

  • Uses ripgrep for searching with context/multiline support, falls back to Clojure regex
  • Downloads sources to ~/.clojure-mcp/deps_cache/ with negative cache for 404s
  • Platform-safe path separator for Windows compatibility
  • Memoized jar lists for fast subsequent searches

Requirements

  • Required: clojure CLI, unzip
  • Optional: rg (ripgrep) for context/multiline, curl for Java source downloads

Test plan

  • All 285 tests pass
  • Tested deps_grep with Clojure files
  • Tested deps_grep with Java files (source download)
  • Tested ripgrep fallback path
  • Verified negative cache only stores 404s
  • Verified binary availability checks

Summary by CodeRabbit

  • New Features
    • Added deps-grep tool to search for patterns within your project's dependencies, featuring glob pattern filtering, multiple output format options, and configurable result limits.
    • Added deps-read tool to browse and read file contents from dependency archives with pagination support and automatic line numbering.

✏️ Tip: You can customize this high-level summary in your review settings.

Bruce Hauman added 5 commits January 22, 2026 09:23
- deps_grep: Search patterns in dependency jars on the classpath
  - Uses `clojure -Spath` to resolve exact dependency jars (cached)
  - Searches inside jars with unzip + regex matching
  - Supports glob/type filters, output modes (content/files/count)
  - Returns jar:entry paths for use with deps_read

- deps_read: Read files from inside dependency jars
  - Takes file_path in jar:entry format (from deps_grep results)
  - Supports offset/limit for large files (mirrors read_file API)
  - Returns content with line numbers
- Format line numbers with arrow (→) to match Read tool style
- Remove header from deps_read output (just content with line numbers)
- deps_grep now also searches *-sources.jar files when available
- Enables searching Java source code in dependencies
- New deps-sources namespace for Maven coordinate parsing and source jar downloading
- Downloads sources from Maven Central to ~/.clojure-mcp/deps_cache/
- Negative cache tracks jars without sources to avoid repeated download attempts
- Only fetches Java sources when --type java or --glob "*.java" is specified
- Memoizes jar lists by [project-dir java-sources?] for fast subsequent lookups
- Parallel downloads using pmap for performance
- Use ripgrep for searching with context/multiline support, fallback to Clojure regex
- Fix path separator for Windows compatibility (use File/pathSeparator)
- Add shared binary-available? utility in clojure-mcp.utils.shell
- Only cache 404s in negative cache, not transient network errors
- Add binary availability checks with helpful error messages
- Document external dependencies (clojure, unzip, rg, curl)
- Consolidate binary checking across grep, deps_grep, deps_sources
- binary-available? now accepts optional probe args (defaults to --help)
- Use -Sdescribe for clojure, -v for unzip
@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Introduces two new MCP tools—deps-grep for searching across project dependency jars and deps-read for reading specific jar entries. Adds supporting modules for source jar management and shell utilities. Refactors existing grep tool to use shared binary-availability checks.

Changes

Cohort / File(s) Summary
Tool Registration
src/clojure_mcp/tools.clj
Registers two new read-only tools: deps-grep-tool and deps-read-tool in the tool symbol collection.
deps-grep Tool
src/clojure_mcp/tools/deps_grep/core.clj, src/clojure_mcp/tools/deps_grep/tool.clj
Implements dependency jar grep functionality with classpath resolution, jar entry listing, ripgrep/fallback search, sources jar detection, and result aggregation. Provides multimethod-based tool registration (name, schema, validation, execution, formatting).
deps-read Tool
src/clojure_mcp/tools/deps_read/core.clj, src/clojure_mcp/tools/deps_read/tool.clj
Implements jar entry content reading with pagination (offset/limit), line numbering, and error handling. Parses combined jar:entry paths and formats output via tool system multimethods.
Dependency Sources Management
src/clojure_mcp/tools/deps_sources/core.clj
Manages downloading and caching Maven source jars with URL construction, negative caching for 404 responses, and parallel resolution of multiple sources jars.
Shell Utilities
src/clojure_mcp/utils/shell.clj
Adds memoized binary-available? function for checking binary availability via probe execution.
Grep Tool Refactoring
src/clojure_mcp/tools/grep/core.clj
Replaces in-file tool-availability cache with delegated call to shell-utils/binary-available?.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Tool as deps-grep Tool
    participant Core as deps-grep Core
    participant ClassPath as Classpath Resolution
    participant Jars as Jar Inspection
    participant Search as Search Engine

    User->>Tool: execute (pattern, options)
    Tool->>Tool: validate-inputs (extract pattern, project-dir)
    Tool->>Core: deps-grep (project-dir, pattern, opts)
    
    Core->>ClassPath: cached-classpath-jars (project-dir)
    ClassPath->>ClassPath: check cache / resolve via clojure -Spath
    ClassPath-->>Core: return jar-list
    
    alt needs Java sources
        Core->>ClassPath: find-sources-jar (jar)
        ClassPath-->>Core: return sources-jar / nil
    end
    
    Core->>Jars: list-jar-entries (jar)
    Jars-->>Core: return entries
    
    Core->>Jars: filter-entries (entries, glob/type)
    Jars-->>Core: return filtered-entries
    
    loop for each entry
        Core->>Search: search-jar-entry (jar, entry, pattern, opts)
        alt rg available
            Search->>Search: search-jar-entry-rg (ripgrep)
        else
            Search->>Search: search-jar-entry-fallback (regex)
        end
        Search-->>Core: return match-results
    end
    
    Core-->>Tool: return aggregated-results
    Tool->>Tool: format-results (mode: count/files/content)
    Tool-->>User: return formatted-output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A rabbit's ode to dependency delight

Through classpath jars we hop and grep,
Each source dancing, line by line,
New tools to read what's tucked inside,
Dependencies mapped with shell's own might,
Caches cached, and sources signed! 📚✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding two new tools (deps_grep and deps_read) for searching and reading dependency JAR files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/clojure_mcp/tools/deps_grep/core.clj`:
- Around line 173-183: Replace the bash pipeline in the search-jar-entry block:
instead of building cmd and calling (shell/sh "bash" "-c" cmd), run unzip
separately to capture unzip-result (using jar-path and entry-path), then invoke
ripgrep by calling shell/sh with "rg" and rg-opts and pattern, passing the unzip
output via the :in keyword argument (use apply/concat to construct args) so
shell/sh receives :in as a proper keyword; reference rg-opts, pattern, jar-path,
entry-path and the surrounding let where cmd/result are defined to locate and
modify the code accordingly.

In `@src/clojure_mcp/tools/deps_read/core.clj`:
- Around line 44-59: Validate the :offset and :limit options at the start of the
function that accepts [jar-path entry-path & {:keys [offset limit
max-line-length] :or {offset 0 max-line-length 2000}}]: ensure offset is a
non-negative integer (>= 0) and if limit is provided it is a positive integer (>
0); if either check fails throw an ex-info with a clear message and include the
invalid value(s) in the ex-info map (e.g., {:offset offset :limit limit}) so
callers get a clear error instead of confusing negative indexing or silent empty
output.

In `@src/clojure_mcp/tools/deps_sources/core.clj`:
- Around line 32-35: The regex in the re-find call fails on Windows because
jar-path may contain backslashes; normalize jar-path by converting backslashes
to forward slashes before matching (e.g., create a normalized-path from jar-path
and use it in the re-find), then proceed with the existing destructuring (match
-> [_ group-path artifact version jar-name]) and group-id computation
(str/replace group-path "/" ".") so Windows .m2/repository paths are parsed
correctly.
🧹 Nitpick comments (2)
src/clojure_mcp/tools/deps_read/core.clj (1)

54-68: Avoid reading entire entry when only a slice is requested.

Line 54 reads the full entry into memory and only then applies :offset/:limit, which can be costly for large sources. Consider streaming the entry (e.g., via java.util.zip.ZipFile + line-seq) and stopping once the limit is reached.

src/clojure_mcp/tools/deps_grep/tool.clj (1)

69-105: Consider renaming count binding to avoid shadowing clojure.core/count.

The destructured count binding on line 73 shadows clojure.core/count, requiring explicit qualification on line 82. A clearer approach would be to rename the binding.

♻️ Suggested rename to avoid shadowing
 (defmethod tool-system/format-results :deps-grep [_ result]
   (if (:error result)
     {:result [(:error result)]
      :error true}
-    (let [{:keys [results count truncated]} result]
+    (let [{:keys [results truncated] match-count :count} result]
       {:result [(cond
                   ;; Count mode
                   (contains? result :count)
-                  (str "Found " count " matches"
+                  (str "Found " match-count " matches"
                        (when truncated " (truncated)"))

                   ;; Files with matches mode
                   (and (seq results) (not (contains? (first results) :matches)))
-                  (str "Found " (clojure.core/count results) " files with matches"
+                  (str "Found " (count results) " files with matches"

Comment on lines +173 to +183
(let [rg-opts (cond-> ["-n"]
case-insensitive (conj "-i")
multiline (conj "-U")
context-before (conj "-B" (str context-before))
context-after (conj "-A" (str context-after))
context (conj "-C" (str context)))
cmd (str "unzip -p " (pr-str jar-path) " " (pr-str entry-path)
" | rg " (str/join " " rg-opts) " " (pr-str pattern))
result (shell/sh "bash" "-c" cmd)]
(when (zero? (:exit result))
(let [matches (->> (str/split-lines (:out result))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's read the file to see the actual code
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '165,195p'

Repository: bhauman/clojure-mcp

Length of output: 1826


🏁 Script executed:

# Check if shell/sh is imported and understand its usage
grep -n "shell/sh" src/clojure_mcp/tools/deps_grep/core.clj | head -20

Repository: bhauman/clojure-mcp

Length of output: 293


🏁 Script executed:

# Look at the ns declaration to understand imports
head -30 src/clojure_mcp/tools/deps_grep/core.clj

Repository: bhauman/clojure-mcp

Length of output: 1119


🏁 Script executed:

# Search for other shell/sh usages in the codebase to see patterns
rg "shell/sh" --type clj -A 2 | head -50

Repository: bhauman/clojure-mcp

Length of output: 90


🏁 Script executed:

# Search for examples of :in usage with shell/sh
rg "shell/sh.*:in" --max-count=20

Repository: bhauman/clojure-mcp

Length of output: 45


🏁 Script executed:

# Look at the actual implementation in the other function to see patterns
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '198,215p'

Repository: bhauman/clojure-mcp

Length of output: 1032


🏁 Script executed:

# Check what clojure.java.shell/sh actually supports
cat -n src/clojure_mcp/utils/shell.clj 2>/dev/null || echo "File not found, searching for shell utilities"

Repository: bhauman/clojure-mcp

Length of output: 740


🏁 Script executed:

# Search for shell utility implementations
fd "shell" --type f | grep -E "\.(clj|md)" | head -10

Repository: bhauman/clojure-mcp

Length of output: 94


🏁 Script executed:

# Check Clojure documentation or examples of stdin usage with shell/sh
rg ":in" src/ --type clj -B 2 -A 2

Repository: bhauman/clojure-mcp

Length of output: 90


🏁 Script executed:

# Check if there are any examples of piping or stdin in the codebase
rg "ByteArrayInputStream|PipedInputStream" src/ --type clj -B 2 -A 2

Repository: bhauman/clojure-mcp

Length of output: 90


🏁 Script executed:

# Let's verify the actual behavior - check if bash is checked before use
grep -n "bash-available\|check.*bash\|binary-available.*bash" src/clojure_mcp/tools/deps_grep/core.clj

Repository: bhauman/clojure-mcp

Length of output: 45


🏁 Script executed:

# Check the error handling and what platforms this is intended to support
fd README -o CONTRIBUTING -o "*.md" | xargs grep -i "platform\|windows\|windows-compatible" 2>/dev/null | head -10

Repository: bhauman/clojure-mcp

Length of output: 206


🏁 Script executed:

# Look at the search-jar-entry-fallback more closely to understand the complete pattern
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '198,225p'

Repository: bhauman/clojure-mcp

Length of output: 1448


🏁 Script executed:

# Let's check the Clojure documentation for shell/sh and :in parameter
python3 << 'EOF'
# Check what the actual shell/sh signature supports
import subprocess
result = subprocess.run(
    ["python3", "-c", """
import sys
# The clojure.java.shell/sh function signature
# Let's look for documentation in code or check usage patterns
"""],
    capture_output=True,
    text=True
)
print("Checking clojure.java.shell documentation...")
EOF

Repository: bhauman/clojure-mcp

Length of output: 149


🏁 Script executed:

# Check how the proposed fix with :in would actually work - search for similar patterns
grep -n "shell/sh.*:out\|shell/sh.*:in\|:in (" src/clojure_mcp/tools/deps_grep/core.clj || echo "No :in patterns found"

Repository: bhauman/clojure-mcp

Length of output: 84


🏁 Script executed:

# Let's verify what the actual working fallback pattern is and if it could be adapted
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '224,250p'

Repository: bhauman/clojure-mcp

Length of output: 1411


🏁 Script executed:

# Check if there's a call to search-jar-entry-fallback to understand the complete flow
grep -n "search-jar-entry-fallback\|search-jar-entry-rg" src/clojure_mcp/tools/deps_grep/core.clj -B 2 -A 2

Repository: bhauman/clojure-mcp

Length of output: 727


🏁 Script executed:

# Let's check the actual clojure.java.shell documentation or look for how :in is used
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '1,50p'

Repository: bhauman/clojure-mcp

Length of output: 2397


🏁 Script executed:

# Search for any documentation or examples of how to handle stdin with shell/sh
grep -rn "clojure.java.shell" . --include="*.clj" --include="*.md" | head -5

Repository: bhauman/clojure-mcp

Length of output: 482


🏁 Script executed:

# Let's see if there are any tests or examples that use :in with shell/sh
find . -name "*test*.clj" -type f 2>/dev/null | head -3

Repository: bhauman/clojure-mcp

Length of output: 211


🌐 Web query:

clojure.java.shell sh :in parameter stdin

💡 Result:

Yes — sh accepts :in as any input source supported by clojure.java.io/copy (String, byte[], File, Reader, InputStream, etc.), and you can also supply :in-enc to control string->bytes encoding. See the API. [1][2]

Examples:

  • Pass a String (default UTF-8):
    (require '[clojure.java.shell :refer [sh]])
    (sh "grep" "foo" :in "line1\nfoo\nline2")

  • Pass a byte[] (sent raw, :in-enc ignored):
    (sh "wc" "-c" :in (.getBytes "hello"))

  • Pass an InputStream:
    (sh "wc" "-c" :in (java.io.ByteArrayInputStream. (.getBytes "hello")))

  • Specify encoding for a String input:
    (sh "some-cmd" :in "text with ü" :in-enc "UTF-8")

Sources: clojure.java.shell API docs and examples. [1][2]

References
[1] clojure.java.shell API (Clojure docs).
[2] clojuredocs / clojure.java.shell (examples).


🏁 Script executed:

# Verify the complete context of the search-jar-entry-rg function to check all edge cases
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '168,196p'

Repository: bhauman/clojure-mcp

Length of output: 1750


🏁 Script executed:

# Let's verify if there's a check for bash availability
grep -n "bash" src/clojure_mcp/tools/deps_grep/core.clj

Repository: bhauman/clojure-mcp

Length of output: 111


🏁 Script executed:

# Check if the pattern in the proposed fix matches the actual rg invocation patterns in the codebase
grep -n "rg" src/clojure_mcp/tools/grep/core.clj | head -10

Repository: bhauman/clojure-mcp

Length of output: 579


Remove bash -c pipeline to preserve Windows compatibility.

Line 181 uses bash -c which will fail on Windows systems without bash installed. The fallback function (search-jar-entry-fallback at line 198) already demonstrates the correct pattern: call unzip separately, then pass the output to rg via the :in parameter.

The proposed approach is sound, but the syntax shown needs correction. The :in parameter must be passed as a keyword argument to shell/sh, not mixed into the arguments list. Use (apply shell/sh (concat ["rg"] rg-opts [pattern] [:in (:out unzip-result)])) or similar, ensuring :in is properly recognized as a keyword argument.

🤖 Prompt for AI Agents
In `@src/clojure_mcp/tools/deps_grep/core.clj` around lines 173 - 183, Replace the
bash pipeline in the search-jar-entry block: instead of building cmd and calling
(shell/sh "bash" "-c" cmd), run unzip separately to capture unzip-result (using
jar-path and entry-path), then invoke ripgrep by calling shell/sh with "rg" and
rg-opts and pattern, passing the unzip output via the :in keyword argument (use
apply/concat to construct args) so shell/sh receives :in as a proper keyword;
reference rg-opts, pattern, jar-path, entry-path and the surrounding let where
cmd/result are defined to locate and modify the code accordingly.

Comment on lines +44 to +59
[jar-path entry-path & {:keys [offset limit max-line-length]
:or {offset 0 max-line-length 2000}}]
(try
;; Verify jar exists
(when-not (.exists (io/file jar-path))
(throw (ex-info "Jar file not found" {:jar-path jar-path})))

;; Read entry from jar
(let [result (shell/sh "unzip" "-p" jar-path entry-path)]
(if (zero? (:exit result))
(let [content (:out result)
all-lines (str/split-lines content)
total-line-count (count all-lines)
;; Apply offset and limit
offset-lines (drop offset all-lines)
limited-lines (if limit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate :offset and :limit to prevent negative indexing.

Negative offsets yield confusing line numbers, and non-positive limits silently return empty output. Add a guard so callers get a clear error. As per coding guidelines, validate inputs and provide helpful error messages in MCP tool implementations.

🛠️ Proposed fix
-[jar-path entry-path & {:keys [offset limit max-line-length]
-                        :or {offset 0 max-line-length 2000}}]
+(let [offset (long offset)
+      limit (when limit (long limit))]
+  (when (neg? offset)
+    (throw (ex-info "Offset must be >= 0" {:offset offset})))
+  (when (and limit (<= limit 0))
+    (throw (ex-info "Limit must be > 0" {:limit limit}))))
🤖 Prompt for AI Agents
In `@src/clojure_mcp/tools/deps_read/core.clj` around lines 44 - 59, Validate the
:offset and :limit options at the start of the function that accepts [jar-path
entry-path & {:keys [offset limit max-line-length] :or {offset 0 max-line-length
2000}}]: ensure offset is a non-negative integer (>= 0) and if limit is provided
it is a positive integer (> 0); if either check fails throw an ex-info with a
clear message and include the invalid value(s) in the ex-info map (e.g.,
{:offset offset :limit limit}) so callers get a clear error instead of confusing
negative indexing or silent empty output.

Comment on lines +32 to +35
(when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" jar-path)]
(let [[_ group-path artifact version jar-name] match
group-id (str/replace group-path "/" ".")]
{:group group-id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle Windows .m2 paths in Maven coordinate parsing.

Line 32 only matches forward slashes, so Windows paths like C:\Users\...\ .m2\repository\... won’t parse and sources won’t download. Normalize separators before regex matching.

🛠️ Proposed fix
-(when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" jar-path)]
-  (let [[_ group-path artifact version jar-name] match
-        group-id (str/replace group-path "/" ".")]
+(let [normalized (str/replace jar-path "\\" "/")]
+  (when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" normalized)]
+    (let [[_ group-path artifact version jar-name] match
+          group-id (str/replace group-path "/" ".")]
       {:group group-id
        :artifact artifact
        :version version
        :jar-name jar-name
        :group-path group-path})))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" jar-path)]
(let [[_ group-path artifact version jar-name] match
group-id (str/replace group-path "/" ".")]
{:group group-id
(let [normalized (str/replace jar-path "\\" "/")]
(when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" normalized)]
(let [[_ group-path artifact version jar-name] match
group-id (str/replace group-path "/" ".")]
{:group group-id
:artifact artifact
:version version
:jar-name jar-name
:group-path group-path})))
🤖 Prompt for AI Agents
In `@src/clojure_mcp/tools/deps_sources/core.clj` around lines 32 - 35, The regex
in the re-find call fails on Windows because jar-path may contain backslashes;
normalize jar-path by converting backslashes to forward slashes before matching
(e.g., create a normalized-path from jar-path and use it in the re-find), then
proceed with the existing destructuring (match -> [_ group-path artifact version
jar-name]) and group-id computation (str/replace group-path "/" ".") so Windows
.m2/repository paths are parsed correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants