-
-
Notifications
You must be signed in to change notification settings - Fork 67
Add deps_grep and deps_read tools for searching dependency jars #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- deps_grep: Search patterns in dependency jars on the classpath - Uses `clojure -Spath` to resolve exact dependency jars (cached) - Searches inside jars with unzip + regex matching - Supports glob/type filters, output modes (content/files/count) - Returns jar:entry paths for use with deps_read - deps_read: Read files from inside dependency jars - Takes file_path in jar:entry format (from deps_grep results) - Supports offset/limit for large files (mirrors read_file API) - Returns content with line numbers
- Format line numbers with arrow (→) to match Read tool style - Remove header from deps_read output (just content with line numbers) - deps_grep now also searches *-sources.jar files when available - Enables searching Java source code in dependencies
- New deps-sources namespace for Maven coordinate parsing and source jar downloading - Downloads sources from Maven Central to ~/.clojure-mcp/deps_cache/ - Negative cache tracks jars without sources to avoid repeated download attempts - Only fetches Java sources when --type java or --glob "*.java" is specified - Memoizes jar lists by [project-dir java-sources?] for fast subsequent lookups - Parallel downloads using pmap for performance
- Use ripgrep for searching with context/multiline support, fallback to Clojure regex - Fix path separator for Windows compatibility (use File/pathSeparator) - Add shared binary-available? utility in clojure-mcp.utils.shell - Only cache 404s in negative cache, not transient network errors - Add binary availability checks with helpful error messages - Document external dependencies (clojure, unzip, rg, curl) - Consolidate binary checking across grep, deps_grep, deps_sources
- binary-available? now accepts optional probe args (defaults to --help) - Use -Sdescribe for clojure, -v for unzip
📝 WalkthroughWalkthroughIntroduces two new MCP tools— Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Tool as deps-grep Tool
participant Core as deps-grep Core
participant ClassPath as Classpath Resolution
participant Jars as Jar Inspection
participant Search as Search Engine
User->>Tool: execute (pattern, options)
Tool->>Tool: validate-inputs (extract pattern, project-dir)
Tool->>Core: deps-grep (project-dir, pattern, opts)
Core->>ClassPath: cached-classpath-jars (project-dir)
ClassPath->>ClassPath: check cache / resolve via clojure -Spath
ClassPath-->>Core: return jar-list
alt needs Java sources
Core->>ClassPath: find-sources-jar (jar)
ClassPath-->>Core: return sources-jar / nil
end
Core->>Jars: list-jar-entries (jar)
Jars-->>Core: return entries
Core->>Jars: filter-entries (entries, glob/type)
Jars-->>Core: return filtered-entries
loop for each entry
Core->>Search: search-jar-entry (jar, entry, pattern, opts)
alt rg available
Search->>Search: search-jar-entry-rg (ripgrep)
else
Search->>Search: search-jar-entry-fallback (regex)
end
Search-->>Core: return match-results
end
Core-->>Tool: return aggregated-results
Tool->>Tool: format-results (mode: count/files/content)
Tool-->>User: return formatted-output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/clojure_mcp/tools/deps_grep/core.clj`:
- Around line 173-183: Replace the bash pipeline in the search-jar-entry block:
instead of building cmd and calling (shell/sh "bash" "-c" cmd), run unzip
separately to capture unzip-result (using jar-path and entry-path), then invoke
ripgrep by calling shell/sh with "rg" and rg-opts and pattern, passing the unzip
output via the :in keyword argument (use apply/concat to construct args) so
shell/sh receives :in as a proper keyword; reference rg-opts, pattern, jar-path,
entry-path and the surrounding let where cmd/result are defined to locate and
modify the code accordingly.
In `@src/clojure_mcp/tools/deps_read/core.clj`:
- Around line 44-59: Validate the :offset and :limit options at the start of the
function that accepts [jar-path entry-path & {:keys [offset limit
max-line-length] :or {offset 0 max-line-length 2000}}]: ensure offset is a
non-negative integer (>= 0) and if limit is provided it is a positive integer (>
0); if either check fails throw an ex-info with a clear message and include the
invalid value(s) in the ex-info map (e.g., {:offset offset :limit limit}) so
callers get a clear error instead of confusing negative indexing or silent empty
output.
In `@src/clojure_mcp/tools/deps_sources/core.clj`:
- Around line 32-35: The regex in the re-find call fails on Windows because
jar-path may contain backslashes; normalize jar-path by converting backslashes
to forward slashes before matching (e.g., create a normalized-path from jar-path
and use it in the re-find), then proceed with the existing destructuring (match
-> [_ group-path artifact version jar-name]) and group-id computation
(str/replace group-path "/" ".") so Windows .m2/repository paths are parsed
correctly.
🧹 Nitpick comments (2)
src/clojure_mcp/tools/deps_read/core.clj (1)
54-68: Avoid reading entire entry when only a slice is requested.Line 54 reads the full entry into memory and only then applies :offset/:limit, which can be costly for large sources. Consider streaming the entry (e.g., via
java.util.zip.ZipFile+line-seq) and stopping once the limit is reached.src/clojure_mcp/tools/deps_grep/tool.clj (1)
69-105: Consider renamingcountbinding to avoid shadowingclojure.core/count.The destructured
countbinding on line 73 shadowsclojure.core/count, requiring explicit qualification on line 82. A clearer approach would be to rename the binding.♻️ Suggested rename to avoid shadowing
(defmethod tool-system/format-results :deps-grep [_ result] (if (:error result) {:result [(:error result)] :error true} - (let [{:keys [results count truncated]} result] + (let [{:keys [results truncated] match-count :count} result] {:result [(cond ;; Count mode (contains? result :count) - (str "Found " count " matches" + (str "Found " match-count " matches" (when truncated " (truncated)")) ;; Files with matches mode (and (seq results) (not (contains? (first results) :matches))) - (str "Found " (clojure.core/count results) " files with matches" + (str "Found " (count results) " files with matches"
| (let [rg-opts (cond-> ["-n"] | ||
| case-insensitive (conj "-i") | ||
| multiline (conj "-U") | ||
| context-before (conj "-B" (str context-before)) | ||
| context-after (conj "-A" (str context-after)) | ||
| context (conj "-C" (str context))) | ||
| cmd (str "unzip -p " (pr-str jar-path) " " (pr-str entry-path) | ||
| " | rg " (str/join " " rg-opts) " " (pr-str pattern)) | ||
| result (shell/sh "bash" "-c" cmd)] | ||
| (when (zero? (:exit result)) | ||
| (let [matches (->> (str/split-lines (:out result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the file to see the actual code
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '165,195p'Repository: bhauman/clojure-mcp
Length of output: 1826
🏁 Script executed:
# Check if shell/sh is imported and understand its usage
grep -n "shell/sh" src/clojure_mcp/tools/deps_grep/core.clj | head -20Repository: bhauman/clojure-mcp
Length of output: 293
🏁 Script executed:
# Look at the ns declaration to understand imports
head -30 src/clojure_mcp/tools/deps_grep/core.cljRepository: bhauman/clojure-mcp
Length of output: 1119
🏁 Script executed:
# Search for other shell/sh usages in the codebase to see patterns
rg "shell/sh" --type clj -A 2 | head -50Repository: bhauman/clojure-mcp
Length of output: 90
🏁 Script executed:
# Search for examples of :in usage with shell/sh
rg "shell/sh.*:in" --max-count=20Repository: bhauman/clojure-mcp
Length of output: 45
🏁 Script executed:
# Look at the actual implementation in the other function to see patterns
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '198,215p'Repository: bhauman/clojure-mcp
Length of output: 1032
🏁 Script executed:
# Check what clojure.java.shell/sh actually supports
cat -n src/clojure_mcp/utils/shell.clj 2>/dev/null || echo "File not found, searching for shell utilities"Repository: bhauman/clojure-mcp
Length of output: 740
🏁 Script executed:
# Search for shell utility implementations
fd "shell" --type f | grep -E "\.(clj|md)" | head -10Repository: bhauman/clojure-mcp
Length of output: 94
🏁 Script executed:
# Check Clojure documentation or examples of stdin usage with shell/sh
rg ":in" src/ --type clj -B 2 -A 2Repository: bhauman/clojure-mcp
Length of output: 90
🏁 Script executed:
# Check if there are any examples of piping or stdin in the codebase
rg "ByteArrayInputStream|PipedInputStream" src/ --type clj -B 2 -A 2Repository: bhauman/clojure-mcp
Length of output: 90
🏁 Script executed:
# Let's verify the actual behavior - check if bash is checked before use
grep -n "bash-available\|check.*bash\|binary-available.*bash" src/clojure_mcp/tools/deps_grep/core.cljRepository: bhauman/clojure-mcp
Length of output: 45
🏁 Script executed:
# Check the error handling and what platforms this is intended to support
fd README -o CONTRIBUTING -o "*.md" | xargs grep -i "platform\|windows\|windows-compatible" 2>/dev/null | head -10Repository: bhauman/clojure-mcp
Length of output: 206
🏁 Script executed:
# Look at the search-jar-entry-fallback more closely to understand the complete pattern
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '198,225p'Repository: bhauman/clojure-mcp
Length of output: 1448
🏁 Script executed:
# Let's check the Clojure documentation for shell/sh and :in parameter
python3 << 'EOF'
# Check what the actual shell/sh signature supports
import subprocess
result = subprocess.run(
["python3", "-c", """
import sys
# The clojure.java.shell/sh function signature
# Let's look for documentation in code or check usage patterns
"""],
capture_output=True,
text=True
)
print("Checking clojure.java.shell documentation...")
EOFRepository: bhauman/clojure-mcp
Length of output: 149
🏁 Script executed:
# Check how the proposed fix with :in would actually work - search for similar patterns
grep -n "shell/sh.*:out\|shell/sh.*:in\|:in (" src/clojure_mcp/tools/deps_grep/core.clj || echo "No :in patterns found"Repository: bhauman/clojure-mcp
Length of output: 84
🏁 Script executed:
# Let's verify what the actual working fallback pattern is and if it could be adapted
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '224,250p'Repository: bhauman/clojure-mcp
Length of output: 1411
🏁 Script executed:
# Check if there's a call to search-jar-entry-fallback to understand the complete flow
grep -n "search-jar-entry-fallback\|search-jar-entry-rg" src/clojure_mcp/tools/deps_grep/core.clj -B 2 -A 2Repository: bhauman/clojure-mcp
Length of output: 727
🏁 Script executed:
# Let's check the actual clojure.java.shell documentation or look for how :in is used
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '1,50p'Repository: bhauman/clojure-mcp
Length of output: 2397
🏁 Script executed:
# Search for any documentation or examples of how to handle stdin with shell/sh
grep -rn "clojure.java.shell" . --include="*.clj" --include="*.md" | head -5Repository: bhauman/clojure-mcp
Length of output: 482
🏁 Script executed:
# Let's see if there are any tests or examples that use :in with shell/sh
find . -name "*test*.clj" -type f 2>/dev/null | head -3Repository: bhauman/clojure-mcp
Length of output: 211
🌐 Web query:
clojure.java.shell sh :in parameter stdin
💡 Result:
Yes — sh accepts :in as any input source supported by clojure.java.io/copy (String, byte[], File, Reader, InputStream, etc.), and you can also supply :in-enc to control string->bytes encoding. See the API. [1][2]
Examples:
-
Pass a String (default UTF-8):
(require '[clojure.java.shell :refer [sh]])
(sh "grep" "foo" :in "line1\nfoo\nline2") -
Pass a byte[] (sent raw, :in-enc ignored):
(sh "wc" "-c" :in (.getBytes "hello")) -
Pass an InputStream:
(sh "wc" "-c" :in (java.io.ByteArrayInputStream. (.getBytes "hello"))) -
Specify encoding for a String input:
(sh "some-cmd" :in "text with ü" :in-enc "UTF-8")
Sources: clojure.java.shell API docs and examples. [1][2]
References
[1] clojure.java.shell API (Clojure docs).
[2] clojuredocs / clojure.java.shell (examples).
🏁 Script executed:
# Verify the complete context of the search-jar-entry-rg function to check all edge cases
cat -n src/clojure_mcp/tools/deps_grep/core.clj | sed -n '168,196p'Repository: bhauman/clojure-mcp
Length of output: 1750
🏁 Script executed:
# Let's verify if there's a check for bash availability
grep -n "bash" src/clojure_mcp/tools/deps_grep/core.cljRepository: bhauman/clojure-mcp
Length of output: 111
🏁 Script executed:
# Check if the pattern in the proposed fix matches the actual rg invocation patterns in the codebase
grep -n "rg" src/clojure_mcp/tools/grep/core.clj | head -10Repository: bhauman/clojure-mcp
Length of output: 579
Remove bash -c pipeline to preserve Windows compatibility.
Line 181 uses bash -c which will fail on Windows systems without bash installed. The fallback function (search-jar-entry-fallback at line 198) already demonstrates the correct pattern: call unzip separately, then pass the output to rg via the :in parameter.
The proposed approach is sound, but the syntax shown needs correction. The :in parameter must be passed as a keyword argument to shell/sh, not mixed into the arguments list. Use (apply shell/sh (concat ["rg"] rg-opts [pattern] [:in (:out unzip-result)])) or similar, ensuring :in is properly recognized as a keyword argument.
🤖 Prompt for AI Agents
In `@src/clojure_mcp/tools/deps_grep/core.clj` around lines 173 - 183, Replace the
bash pipeline in the search-jar-entry block: instead of building cmd and calling
(shell/sh "bash" "-c" cmd), run unzip separately to capture unzip-result (using
jar-path and entry-path), then invoke ripgrep by calling shell/sh with "rg" and
rg-opts and pattern, passing the unzip output via the :in keyword argument (use
apply/concat to construct args) so shell/sh receives :in as a proper keyword;
reference rg-opts, pattern, jar-path, entry-path and the surrounding let where
cmd/result are defined to locate and modify the code accordingly.
| [jar-path entry-path & {:keys [offset limit max-line-length] | ||
| :or {offset 0 max-line-length 2000}}] | ||
| (try | ||
| ;; Verify jar exists | ||
| (when-not (.exists (io/file jar-path)) | ||
| (throw (ex-info "Jar file not found" {:jar-path jar-path}))) | ||
|
|
||
| ;; Read entry from jar | ||
| (let [result (shell/sh "unzip" "-p" jar-path entry-path)] | ||
| (if (zero? (:exit result)) | ||
| (let [content (:out result) | ||
| all-lines (str/split-lines content) | ||
| total-line-count (count all-lines) | ||
| ;; Apply offset and limit | ||
| offset-lines (drop offset all-lines) | ||
| limited-lines (if limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate :offset and :limit to prevent negative indexing.
Negative offsets yield confusing line numbers, and non-positive limits silently return empty output. Add a guard so callers get a clear error. As per coding guidelines, validate inputs and provide helpful error messages in MCP tool implementations.
🛠️ Proposed fix
-[jar-path entry-path & {:keys [offset limit max-line-length]
- :or {offset 0 max-line-length 2000}}]
+(let [offset (long offset)
+ limit (when limit (long limit))]
+ (when (neg? offset)
+ (throw (ex-info "Offset must be >= 0" {:offset offset})))
+ (when (and limit (<= limit 0))
+ (throw (ex-info "Limit must be > 0" {:limit limit}))))🤖 Prompt for AI Agents
In `@src/clojure_mcp/tools/deps_read/core.clj` around lines 44 - 59, Validate the
:offset and :limit options at the start of the function that accepts [jar-path
entry-path & {:keys [offset limit max-line-length] :or {offset 0 max-line-length
2000}}]: ensure offset is a non-negative integer (>= 0) and if limit is provided
it is a positive integer (> 0); if either check fails throw an ex-info with a
clear message and include the invalid value(s) in the ex-info map (e.g.,
{:offset offset :limit limit}) so callers get a clear error instead of confusing
negative indexing or silent empty output.
| (when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" jar-path)] | ||
| (let [[_ group-path artifact version jar-name] match | ||
| group-id (str/replace group-path "/" ".")] | ||
| {:group group-id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle Windows .m2 paths in Maven coordinate parsing.
Line 32 only matches forward slashes, so Windows paths like C:\Users\...\ .m2\repository\... won’t parse and sources won’t download. Normalize separators before regex matching.
🛠️ Proposed fix
-(when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" jar-path)]
- (let [[_ group-path artifact version jar-name] match
- group-id (str/replace group-path "/" ".")]
+(let [normalized (str/replace jar-path "\\" "/")]
+ (when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" normalized)]
+ (let [[_ group-path artifact version jar-name] match
+ group-id (str/replace group-path "/" ".")]
{:group group-id
:artifact artifact
:version version
:jar-name jar-name
:group-path group-path})))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" jar-path)] | |
| (let [[_ group-path artifact version jar-name] match | |
| group-id (str/replace group-path "/" ".")] | |
| {:group group-id | |
| (let [normalized (str/replace jar-path "\\" "/")] | |
| (when-let [match (re-find #"\.m2/repository/(.+)/([^/]+)/([^/]+)/([^/]+\.jar)$" normalized)] | |
| (let [[_ group-path artifact version jar-name] match | |
| group-id (str/replace group-path "/" ".")] | |
| {:group group-id | |
| :artifact artifact | |
| :version version | |
| :jar-name jar-name | |
| :group-path group-path}))) |
🤖 Prompt for AI Agents
In `@src/clojure_mcp/tools/deps_sources/core.clj` around lines 32 - 35, The regex
in the re-find call fails on Windows because jar-path may contain backslashes;
normalize jar-path by converting backslashes to forward slashes before matching
(e.g., create a normalized-path from jar-path and use it in the re-find), then
proceed with the existing destructuring (match -> [_ group-path artifact version
jar-name]) and group-id computation (str/replace group-path "/" ".") so Windows
.m2/repository paths are parsed correctly.
Summary
deps_greptool - Search for patterns inside dependency jar files on the classpathdeps_readtool - Read files from inside dependency jars (usingjar-path:entry-pathformat)--type java)binary-available?utility - Consolidated binary checking across grep, deps_grep, deps_sourcesFeatures
~/.clojure-mcp/deps_cache/with negative cache for 404sRequirements
clojureCLI,unziprg(ripgrep) for context/multiline,curlfor Java source downloadsTest plan
Summary by CodeRabbit
deps-greptool to search for patterns within your project's dependencies, featuring glob pattern filtering, multiple output format options, and configurable result limits.deps-readtool to browse and read file contents from dependency archives with pagination support and automatic line numbering.✏️ Tip: You can customize this high-level summary in your review settings.