Skip to content

Feat: add http request tool#98

Open
wjc1207 wants to merge 1 commit intomemovai:mainfrom
wjc1207:feat/arxiv
Open

Feat: add http request tool#98
wjc1207 wants to merge 1 commit intomemovai:mainfrom
wjc1207:feat/arxiv

Conversation

@wjc1207
Copy link
Copy Markdown
Contributor

@wjc1207 wjc1207 commented Feb 26, 2026

Summary by CodeRabbit

  • New Features

    • HTTP request tool: execute web requests with multiple methods, custom headers, optional bodies, and formatted responses.
    • ArXiv search skill: built-in skill to perform academic paper searches using the HTTP tool.
  • Chores

    • Added an additional ignore rule to project ignore configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new HTTP request tool (implementation + header), registers it in the tool registry and build, introduces an arxiv-search built-in skill that uses the HTTP tool, and updates .gitignore.

Changes

Cohort / File(s) Summary
HTTP Request Tool
main/tools/tool_http_request.c, main/tools/tool_http_request.h
New C module and header implementing tool_http_request_execute, supporting multiple HTTP methods, optional headers/body, optional proxy routing, response formatting, JSON input parsing, and error handling.
Tool Registration & Build
main/tools/tool_registry.c, main/CMakeLists.txt
Registers the http_request tool with description and execute callback; adds tools/tool_http_request.c to component sources.
Skills Integration
main/skills/skill_loader.c
Adds BUILTIN_ARXIV_SEARCH skill content and registers arxiv-search in the built-in skills array.
Misc
.gitignore
Adds _codeql_detected_source_root to ignored paths.

Sequence Diagram(s)

sequenceDiagram
    participant Skill as ArXiv Skill
    participant HTTPTool as HTTP Request Tool
    participant Proxy as Proxy Service
    participant HTTPClient as HTTP Client
    participant Server as Remote Server

    Skill->>HTTPTool: call tool_http_request_execute(input JSON)
    HTTPTool->>HTTPTool: parse JSON (url, method, headers, body)
    HTTPTool->>HTTPTool: validate method, build request
    alt proxy enabled
        HTTPTool->>Proxy: send proxied request
        Proxy->>HTTPClient: forward request
    else direct
        HTTPTool->>HTTPClient: send direct request
    end
    HTTPClient->>Server: HTTP request
    Server-->>HTTPClient: response (status, headers, body)
    HTTPClient-->>HTTPTool: deliver response
    HTTPTool->>HTTPTool: strip headers, format "Status: <code>\n\n<body>"
    HTTPTool-->>Skill: return formatted response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • IRONICBo

Poem

🐰 I hopped through code to fetch a page,
Headers trimmed and body staged.
ArXiv queries take their flight,
Proxy paths or direct delight—
Papers fetched in moonlit byte. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Feat: add http request tool' accurately describes the primary change—adding a new HTTP request tool implementation across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
main/tools/tool_arxiv.c (1)

55-73: Make tool_arxiv_init() idempotent to avoid leaks on re-init.

If initialization is called more than once, mutex/cache allocations can be replaced without cleanup.

♻️ Suggested guard
 esp_err_t tool_arxiv_init(void)
 {
+    if (s_mutex && s_cache_val) {
+        return ESP_OK;
+    }
+
     s_mutex = xSemaphoreCreateMutex();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_arxiv.c` around lines 55 - 73, tool_arxiv_init currently
assumes a fresh start and will leak resources if called multiple times; make it
idempotent by checking and cleaning existing resources before re-allocating: if
s_mutex or s_cache_val are non-NULL at entry, free them
(vSemaphoreDelete(s_mutex) and heap_caps_free(s_cache_val)) and set to NULL, or
return early if already initialized depending on desired semantics, then proceed
to create the mutex (xSemaphoreCreateMutex) and allocate s_cache_val
(heap_caps_calloc) as before; ensure error paths still clean up
partially-created resources (delete mutex if calloc fails) and leave global
handles in a consistent state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/tools/tool_arxiv.c`:
- Around line 42-49: The handler for HTTP_EVENT_ON_DATA currently drops the
whole chunk when there isn't room (in the if using needed < ab->cap - 1); change
it to copy as much as fits: compute size_t available = ab->cap - 1 - ab->len,
then size_t to_copy = MIN(evt->data_len, available), memcpy(ab->data + ab->len,
evt->data, to_copy), increment ab->len by to_copy and set ab->data[ab->len] =
'\0' (and only skip copying when available == 0); use the existing variables
evt->data_len, evt->data, ab->len, ab->cap and memcpy to locate the change.
- Around line 322-325: The loop that reads from proxy_conn_read(conn, tmp,
sizeof(tmp), 15000) wrongly treats n == 0 as EOF; change the logic so only
negative returns indicate error/EOF while n == 0 is treated as a
transient/read-would-block and handled by continuing (or retrying) rather than
breaking. Concretely, in the while(1) loop around proxy_conn_read, preserve
handling for n < 0 as failure, handle n == 0 by continuing (optionally after a
short backoff) and track a small consecutive-zero-read counter or elapsed time
to avoid infinite loops, and only stop reading when either data is received (n >
0), a real error occurs (n < 0), or a retry timeout/threshold is exceeded; refer
to proxy_conn_read, conn, tmp, total and ab->cap to locate and update the code.
- Around line 455-465: The cache is being populated from the caller-provided
output buffer (bounded by output_size), so a small first caller can store a
truncated result; instead, produce the canonical full-formatted result into a
cache-sized temporary buffer and use that to update s_cache_val while still
copying a size-limited portion into the caller's output. Concretely: call
format_results(ab.data, temp_cache_buf, ARXIV_CACHE_VAL_SIZE) (using ab.data and
ARXIV_CACHE_VAL_SIZE), then under s_mutex copy temp_cache_buf into s_cache_val
and strncpy/snprintf the truncated version into output respecting output_size;
keep using s_cache_key and ARXIV_CACHE_KEY_LEN as before and free ab.data as
currently done.
- Around line 404-406: The cache key construction using snprintf into cache_key
(size ARXIV_CACHE_KEY_LEN) can truncate long inputs (keywords_j->valuestring)
and cause incorrect cache hits; update the code around the snprintf call to
check snprintf's return value and handle truncation: if snprintf indicates
truncation, generate a non-truncated unique key (e.g., compute a fixed-size hash
such as SHA256/MD5 of the full keywords_j->valuestring combined with start and
max_results) or allocate a sufficiently large buffer dynamically, then use that
full/hashed key for cache lookups; ensure the logic around cache_key,
ARXIV_CACHE_KEY_LEN, keywords_j, start, and max_results consistently uses the
non-truncated identifier so distinct long queries do not collide.
- Around line 303-313: The header buffer is too small: replace the fixed char
header[512] used with snprintf in the proxy request block by allocating a buffer
sized from strlen(path) + room for the constant header text (or use a safe upper
bound like 1024/2048), then call snprintf into that dynamic buffer, check the
snprintf return (as currently done) against the allocated size, and on error
free the buffer before calling proxy_conn_close(conn) and returning ESP_FAIL;
update the code around the header declaration and the snprintf(...) call that
references path to use the new buffer and ensure proper free on all error paths.

In `@main/tools/tool_registry.c`:
- Around line 180-196: Check the return value of tool_arxiv_init() and fail
fast: if tool_arxiv_init() indicates failure (e.g., returns NULL/false/error
code), log or report the error and do not proceed to construct/register the
mimi_tool_t arxiv or call register_tool(&arxiv); ensure the code path returns an
error status (or exits) instead of continuing, and keep references to
tool_arxiv_execute and the mimi_tool_t arxiv structure so the happy path remains
unchanged when initialization succeeds.
- Around line 179-194: Add the missing header that declares tool_arxiv_init and
tool_arxiv_execute to the top of this file so the compiler sees their
prototypes; specifically include the appropriate arXiv tool header (e.g., the
header that defines tool_arxiv_init/tool_arxiv_execute, such as "tool_arxiv.h")
and then rebuild to verify the unresolved symbol errors are gone.

---

Nitpick comments:
In `@main/tools/tool_arxiv.c`:
- Around line 55-73: tool_arxiv_init currently assumes a fresh start and will
leak resources if called multiple times; make it idempotent by checking and
cleaning existing resources before re-allocating: if s_mutex or s_cache_val are
non-NULL at entry, free them (vSemaphoreDelete(s_mutex) and
heap_caps_free(s_cache_val)) and set to NULL, or return early if already
initialized depending on desired semantics, then proceed to create the mutex
(xSemaphoreCreateMutex) and allocate s_cache_val (heap_caps_calloc) as before;
ensure error paths still clean up partially-created resources (delete mutex if
calloc fails) and leave global handles in a consistent state.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9a9211 and f187086.

📒 Files selected for processing (4)
  • main/CMakeLists.txt
  • main/tools/tool_arxiv.c
  • main/tools/tool_arxiv.h
  • main/tools/tool_registry.c

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
main/tools/tool_registry.c (1)

181-196: ⚠️ Potential issue | 🟠 Major

Propagate arXiv init failures before registering the tool.

tool_registry_init() currently proceeds to register arxiv_search and still returns success even if arXiv init cannot complete. Please gate registration on init success (or update tool_arxiv_init to return esp_err_t so failures can be propagated).

🔧 Proposed fix (registry side)
-    tool_arxiv_init();
+    esp_err_t arxiv_err = tool_arxiv_init();
+    if (arxiv_err != ESP_OK) {
+        ESP_LOGE(TAG, "Failed to init arxiv tool: %s", esp_err_to_name(arxiv_err));
+        return arxiv_err;
+    }
#!/bin/bash
# Verify init signature and whether init has failure paths that should be propagated.
rg -nP 'tool_arxiv_init\s*\(' main/tools/tool_arxiv.h main/tools/tool_arxiv.c main/tools/tool_registry.c -C2
rg -nP '(ESP_FAIL|ESP_ERR_|return\s+ESP_|malloc|calloc|xSemaphoreCreate|esp_http_client_init)' main/tools/tool_arxiv.c -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_registry.c` around lines 181 - 196, tool_registry_init
currently calls tool_arxiv_init() but still registers the arxiv_search tool even
if init fails; change the flow so registration is gated on successful
initialization: modify tool_arxiv_init to return esp_err_t (or otherwise expose
an init success boolean) and in tool_registry_init call tool_arxiv_init(), check
for ESP_OK (or the boolean) and only call register_tool(&arxiv) and proceed when
init succeeded—if init fails, return the error from tool_registry_init instead
of registering; reference tool_arxiv_init, tool_registry_init, register_tool,
and the mimi_tool_t arxiv/execute = tool_arxiv_execute symbols to locate
relevant code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@main/tools/tool_registry.c`:
- Around line 181-196: tool_registry_init currently calls tool_arxiv_init() but
still registers the arxiv_search tool even if init fails; change the flow so
registration is gated on successful initialization: modify tool_arxiv_init to
return esp_err_t (or otherwise expose an init success boolean) and in
tool_registry_init call tool_arxiv_init(), check for ESP_OK (or the boolean) and
only call register_tool(&arxiv) and proceed when init succeeded—if init fails,
return the error from tool_registry_init instead of registering; reference
tool_arxiv_init, tool_registry_init, register_tool, and the mimi_tool_t
arxiv/execute = tool_arxiv_execute symbols to locate relevant code.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f187086 and ebe0e08.

📒 Files selected for processing (2)
  • main/tools/tool_arxiv.c
  • main/tools/tool_registry.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • main/tools/tool_arxiv.c

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
main/tools/tool_arxiv.c (3)

411-413: ⚠️ Potential issue | 🟠 Major

Cache key truncation can return wrong cached results.

The cache key is built without checking for truncation. If two distinct long keyword queries truncate to the same 256-byte prefix, they'll incorrectly share cached results. Consider checking snprintf's return value and skipping cache operations when truncation occurs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_arxiv.c` around lines 411 - 413, The cache key construction
using snprintf into cache_key (size ARXIV_CACHE_KEY_LEN) can truncate long
keyword strings and cause collisions; update the logic in the code that builds
cache_key to check snprintf's return value (or use vsnprintf) and detect if the
generated length >= ARXIV_CACHE_KEY_LEN, and in that case avoid using the cache
for this query (i.e., skip cache lookup/store) or allocate a sufficiently large
buffer instead; ensure the change touches the snprintf call that formats
"%s|%d|%d" with keywords_j->valuestring, start, max_results so truncation is
detected and handled.

304-314: ⚠️ Potential issue | 🟠 Major

Proxy request header buffer is too small for valid paths.

The path parameter can be up to 640 bytes (see Line 431), but the header buffer is only 512 bytes. Combined with the HTTP header template (~80 bytes), this will cause truncation for longer keyword queries, resulting in malformed proxy requests.

Consider dynamically allocating the header buffer based on strlen(path) + 256 or increasing the fixed size to at least 1024 bytes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_arxiv.c` around lines 304 - 314, The fixed-size header buffer
"header" is too small for allowed path lengths and can truncate the formatted
request; change the code that builds the request (the snprintf into header and
the hlen check) to allocate the buffer dynamically based on strlen(path) + 256
(or at least 1024) instead of the 512-byte stack buffer: compute size =
strlen(path) + 256, malloc a buffer for "header", use snprintf into that buffer,
retain the existing hlen < 0 / hlen >= size check and on error call
proxy_conn_close(conn), free the buffer on all return paths, and replace any
references to the stack variable with the dynamically allocated buffer.

462-472: ⚠️ Potential issue | 🟠 Major

Cache stores caller-truncated output.

The cache is populated from the caller's output buffer, which is bounded by output_size. If the first caller has a small buffer, the truncated result is cached and returned to subsequent callers who may have larger buffers.

Format results into a cache-sized temporary buffer first, then copy to both cache and caller's output.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_arxiv.c` around lines 462 - 472, The current code fills the
caller's output buffer (output, output_size) then stores that possibly truncated
buffer into s_cache_val, causing small-buffer callers to pollute the cache;
instead, call format_results into a temporary buffer sized ARXIV_CACHE_VAL_SIZE
(e.g. char tmp[ARXIV_CACHE_VAL_SIZE]), then free ab.data, log length from tmp,
take s_mutex and strncpy tmp into s_cache_val (ensuring NUL termination) and
copy/truncate tmp into the caller's output using output_size (ensuring NUL
termination) before giving s_mutex; reference functions/variables:
format_results, ab.data, s_cache_key, s_cache_val, ARXIV_CACHE_VAL_SIZE,
output/output_size, xSemaphoreTake/xSemaphoreGive, strncpy.
main/tools/tool_registry.c (1)

180-181: ⚠️ Potential issue | 🟠 Major

Check tool_arxiv_init() return value before registering the tool.

The initialization can fail (e.g., PSRAM allocation failure), but the return value is ignored. If init fails, the tool is still registered and will return ESP_ERR_INVALID_STATE at runtime, which is a poor user experience.

🔧 Proposed fix
     /* Register arxiv_search */
-    tool_arxiv_init();
+    esp_err_t arxiv_err = tool_arxiv_init();
+    if (arxiv_err != ESP_OK) {
+        ESP_LOGE(TAG, "Failed to init arxiv tool: %s", esp_err_to_name(arxiv_err));
+        /* Continue without arxiv tool rather than failing entire registry */
+    }

-    mimi_tool_t arxiv = {
+    if (arxiv_err == ESP_OK) {
+        mimi_tool_t arxiv = {
         .name = "arxiv_search",
         .description = "Search arXiv for academic papers by keywords. Returns title, authors, summary, and link for each result.",
         .input_schema_json =
             "{\"type\":\"object\","
             "\"properties\":{"
             "\"keywords\":{\"type\":\"string\",\"description\":\"Search keywords or phrase\"},"
             "\"start\":{\"type\":\"integer\",\"description\":\"Pagination offset (default 0)\"},"
             "\"max_results\":{\"type\":\"integer\",\"description\":\"Number of results to return (1-20, default 5)\"}"
             "},"
             "\"required\":[\"keywords\"]}",
         .execute = tool_arxiv_execute,
-    };
-    register_tool(&arxiv);
+        };
+        register_tool(&arxiv);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_registry.c` around lines 180 - 181, Call and check the return
value of tool_arxiv_init() (assign to an esp_err_t like ret) and only proceed
with registering the arXiv tool when ret == ESP_OK; on failure log an error
including ret and avoid registering the tool (or return the error from the
enclosing init function) so the tool isn't left in an invalid state. Use the
symbol tool_arxiv_init to locate the call site and handle non-ESP_OK values
(e.g., ESP_ERR_NO_MEM / ESP_ERR_INVALID_STATE) by skipping registration and
performing any necessary cleanup/reporting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@main/tools/tool_arxiv.c`:
- Around line 411-413: The cache key construction using snprintf into cache_key
(size ARXIV_CACHE_KEY_LEN) can truncate long keyword strings and cause
collisions; update the logic in the code that builds cache_key to check
snprintf's return value (or use vsnprintf) and detect if the generated length >=
ARXIV_CACHE_KEY_LEN, and in that case avoid using the cache for this query
(i.e., skip cache lookup/store) or allocate a sufficiently large buffer instead;
ensure the change touches the snprintf call that formats "%s|%d|%d" with
keywords_j->valuestring, start, max_results so truncation is detected and
handled.
- Around line 304-314: The fixed-size header buffer "header" is too small for
allowed path lengths and can truncate the formatted request; change the code
that builds the request (the snprintf into header and the hlen check) to
allocate the buffer dynamically based on strlen(path) + 256 (or at least 1024)
instead of the 512-byte stack buffer: compute size = strlen(path) + 256, malloc
a buffer for "header", use snprintf into that buffer, retain the existing hlen <
0 / hlen >= size check and on error call proxy_conn_close(conn), free the buffer
on all return paths, and replace any references to the stack variable with the
dynamically allocated buffer.
- Around line 462-472: The current code fills the caller's output buffer
(output, output_size) then stores that possibly truncated buffer into
s_cache_val, causing small-buffer callers to pollute the cache; instead, call
format_results into a temporary buffer sized ARXIV_CACHE_VAL_SIZE (e.g. char
tmp[ARXIV_CACHE_VAL_SIZE]), then free ab.data, log length from tmp, take s_mutex
and strncpy tmp into s_cache_val (ensuring NUL termination) and copy/truncate
tmp into the caller's output using output_size (ensuring NUL termination) before
giving s_mutex; reference functions/variables: format_results, ab.data,
s_cache_key, s_cache_val, ARXIV_CACHE_VAL_SIZE, output/output_size,
xSemaphoreTake/xSemaphoreGive, strncpy.

In `@main/tools/tool_registry.c`:
- Around line 180-181: Call and check the return value of tool_arxiv_init()
(assign to an esp_err_t like ret) and only proceed with registering the arXiv
tool when ret == ESP_OK; on failure log an error including ret and avoid
registering the tool (or return the error from the enclosing init function) so
the tool isn't left in an invalid state. Use the symbol tool_arxiv_init to
locate the call site and handle non-ESP_OK values (e.g., ESP_ERR_NO_MEM /
ESP_ERR_INVALID_STATE) by skipping registration and performing any necessary
cleanup/reporting.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebe0e08 and f6e4f99.

📒 Files selected for processing (4)
  • main/CMakeLists.txt
  • main/tools/tool_arxiv.c
  • main/tools/tool_arxiv.h
  • main/tools/tool_registry.c

@IRONICBo
Copy link
Copy Markdown
Member

Thanks for your contribution! I will check it :)

@IRONICBo IRONICBo self-requested a review February 26, 2026 15:24
@IRONICBo IRONICBo self-assigned this Feb 26, 2026
@IRONICBo IRONICBo added enhancement New feature or request ready to merge P2 Priority: Tier 3 - Core functionality extensions labels Feb 26, 2026
@crispyberry
Copy link
Copy Markdown
Contributor

Hi @wjc1207 and @IRONICBo,

Open for discussion — but does it make more sense to implement this as part of the skill system rather than the tool system?

Our goal is to minimize context bloat by not injecting too many tools, following the Pi-agent design philosophy.

@IRONICBo
Copy link
Copy Markdown
Member

IRONICBo commented Feb 27, 2026

Hi @wjc1207 and @IRONICBo,

Open for discussion — but does it make more sense to implement this as part of the skill system rather than the tool system?

Our goal is to minimize context bloat by not injecting too many tools, following the Pi-agent design philosophy.

Yes, I perfer to add some markdown files to achieve this target. @crispyberry

@wjc1207
Copy link
Copy Markdown
Contributor Author

wjc1207 commented Feb 27, 2026

Hi @wjc1207 and @IRONICBo,

Open for discussion — but does it make more sense to implement this as part of the skill system rather than the tool system?

Our goal is to minimize context bloat by not injecting too many tools, following the Pi-agent design philosophy.

@crispyberry I agree with you, the arxiv search should be a skill that works on specific task. I'll modify it soon.

@wjc1207
Copy link
Copy Markdown
Contributor Author

wjc1207 commented Feb 27, 2026

initial plan:

  • add HTTP request tool
  • add arxiv search skill (plain text)

@crispyberry
Copy link
Copy Markdown
Contributor

Hi wang would u like to connect with me via my gmail ssslvky@gmail.com? Just send hi to me

@wjc1207 wjc1207 reopened this Feb 27, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/skills/skill_loader.c`:
- Around line 67-73: Update the multi-line guidance in skill_loader.c that
explains how to build the ArXiv API query URL (the block starting with "1.
Identify the search keywords..." and "2. Build the ArXiv API query URL:") to add
an explicit step stating that all search terms must be percent-encoded per RFC
3986 (e.g., `(` -> `%28`, `"` -> `%22`, `&` -> `%26`, literal `+` -> `%2B`)
before joining with `+AND+`, and note that reserved characters and special
punctuation must be encoded in query parameter values in addition to encoding
spaces as `+`.

In `@main/tools/tool_http_request.c`:
- Around line 262-268: The code in tool_http_request_execute (and the similar
block at 339-347) uses snprintf/remaining arithmetic without validating output
and output_size, which can underflow and cause OOB writes; first validate that
output is non-NULL and output_size > 0 at the top of the function and if invalid
return an error (and if output_size>0 set output[0] = '\0' to keep it sane),
then replace the current snprintf/remaining logic with robust length checks:
call snprintf with output_size and capture its return (int ret), clamp off = ret
< (int)output_size ? ret : (int)output_size - 1, compute remaining only when
output_size > (size_t)off + 1, and use strncat/strncpy or memcpy guarded by
remaining to avoid copying more bytes than remaining; apply the same validation
and fixes to the other block referenced (lines ~339-347) so no underflow or OOB
copy can occur.
- Around line 116-119: The port parsing branch that currently computes hlen and
sets *port = atoi(colon + 1) must validate the port string and reject malformed
or out-of-range values rather than treating them as 0; replace the naive atoi
usage with proper parsing (e.g., use strtol or equivalent) to check that the
characters after colon are all digits, detect conversion errors, and ensure the
numeric value is within the valid TCP/UDP port range (1–65535); on any parse or
range failure return an error/fail fast from the surrounding function instead of
proceeding with *port == 0 and keep the existing logic that sets hlen and
handles the slash case when parsing succeeds.
- Around line 29-35: The HTTP_EVENT_ON_DATA handler currently discards incoming
bytes when they exactly fill the remaining buffer or when a chunk overflows,
because the condition uses "needed + 1 < hb->cap" and it never copies partial
overflow; change the check to allow exact-fit by using "<= hb->cap - 1" (or
equivalently "needed + 1 <= hb->cap") and, if the chunk still overflows, copy
only the available bytes: compute size_t avail = hb->cap - hb->len - 1,
memcpy(avail) from evt->data into hb->data + hb->len, update hb->len by avail
and null-terminate hb->data; keep dropping any remaining bytes beyond avail (or
log/handle as needed). Ensure you reference the HTTP_EVENT_ON_DATA branch and
fields hb->len, hb->cap, hb->data and evt->data_len/evt->data when making these
changes.
- Around line 171-206: The formatted request may be truncated so check
snprintf's return (rlen) against the actual allocated buffer size (the capacity
passed to snprintf, i.e., 2048 + body_len or 2048) before calling
proxy_conn_write; if rlen < 0 or rlen >= capacity treat it as an error (free
req_buf and return an appropriate error) to avoid passing a length that exceeds
req_buf capacity. Locate the block that allocates req_buf (heap_caps_calloc),
the two snprintf calls that set rlen, and the proxy_conn_write call and add a
validation branch after each snprintf to handle truncation/format errors safely.
- Around line 330-351: The log uses method after cJSON_Delete(input) which frees
the JSON backing memory so method is dangling; fix by copying the method string
to an owned buffer (e.g., strdup or strncpy into a local buffer) before calling
cJSON_Delete(input), use that copied string in ESP_LOGI(TAG, "HTTP %s complete,
...", method), and free the copied buffer after the log (ensure hb.data is freed
at the correct time as well).
- Around line 159-167: The code building hdr_extra with hdr_off can overflow the
1024 buffer when many/long headers are appended; modify the loop that iterates
headers (cJSON_ArrayForEach over headers) to compute remaining =
sizeof(hdr_extra) - hdr_off, skip or truncate the header when remaining <= 1,
call snprintf with remaining as the max size, capture snprintf's return and if
it returns >= remaining clamp hdr_off = sizeof(hdr_extra)-1 and break (or stop
appending), ensuring hdr_off never exceeds the buffer and hdr_extra remains
NUL-terminated; alternatively replace the fixed buffer with a dynamically
growing buffer when more space is needed.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6e4f99 and 007f5d1.

📒 Files selected for processing (6)
  • .gitignore
  • main/CMakeLists.txt
  • main/skills/skill_loader.c
  • main/tools/tool_http_request.c
  • main/tools/tool_http_request.h
  • main/tools/tool_registry.c
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
  • main/tools/tool_registry.c
  • main/CMakeLists.txt

Copy link
Copy Markdown
Member

@IRONICBo IRONICBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool_http_request implementation is solid — good proxy support, proper input validation, PSRAM allocation. However, this is a general-purpose HTTP request tool exposed to the LLM with no URL allowlist or rate limiting. On a public Telegram bot this is an SSRF vector — the LLM (or a prompt injection) can hit internal network addresses, cloud metadata endpoints (169.254.169.254), etc.

At minimum, block RFC-1918 and link-local IPs, or scope the tool to specific domains (e.g. export.arxiv.org only). Also consider whether the arxiv skill alone (using the existing web_search tool) would be sufficient without adding a raw HTTP tool.

@wjc1207
Copy link
Copy Markdown
Contributor Author

wjc1207 commented Feb 28, 2026

@IRONICBo Open for discussion. I think we should keep the http request tool as we can utilize it to get access web cam and web rgb led in local area netwrok. Of course there should be some constraints on this tool.

we can add require_first_visit_approval
ref: https://github.com/zeroclaw-labs/zeroclaw/blob/main/src/tools/url_validation.rs

@wjc1207 wjc1207 requested a review from IRONICBo March 1, 2026 09:30
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (7)
main/tools/tool_http_request.c (7)

416-437: ⚠️ Potential issue | 🔴 Critical

method is used after freeing its backing JSON object.

method points into the input cJSON structure. After cJSON_Delete(input) on line 416, method becomes a dangling pointer, but it's used in the log on line 437.

Suggested fix: copy method before deletion
     /* Method (default: GET) */
-    const char *method = "GET";
+    char method_buf[8] = "GET";
     cJSON *method_item = cJSON_GetObjectItem(input, "method");
     if (method_item && cJSON_IsString(method_item) && method_item->valuestring[0]) {
-        method = method_item->valuestring;
+        strncpy(method_buf, method_item->valuestring, sizeof(method_buf) - 1);
+        method_buf[sizeof(method_buf) - 1] = '\0';
     }
+    const char *method = method_buf;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_http_request.c` around lines 416 - 437, The variable method
points into the cJSON structure input and is freed by cJSON_Delete(input), so
copy method into a local buffer before calling cJSON_Delete to avoid a dangling
pointer; e.g., allocate or stack-allocate a small buffer (e.g., method_copy) and
copy the string (using strncpy/strdup/snprintf semantics), then call
cJSON_Delete(input), use method_copy in the ESP_LOGI call, and free method_copy
if it was heap-allocated; ensure references to method in the function (notably
the ESP_LOGI(TAG, "HTTP %s complete...") call) are replaced with the safe copy.

240-266: ⚠️ Potential issue | 🔴 Critical

Request buffer capacity mismatch can cause truncated write with wrong length.

Two issues:

  1. When body_len > 0, rlen from snprintf could exceed 2048 + body_len if truncation occurs, but that value is passed to proxy_conn_write.
  2. When body_len == 0, the buffer is still allocated as 2048 + body_len (= 2048), but snprintf uses size 2048. If hdr_extra is large, the formatted request could be truncated, yet rlen (the would-be length) is written.
Suggested fix: validate snprintf return
+    size_t req_cap = 2048 + body_len;
     int rlen;
     if (body_len > 0) {
         // ... snprintf ...
-        rlen = snprintf(req_buf, 2048 + body_len, ...);
+        rlen = snprintf(req_buf, req_cap, ...);
     } else {
-        rlen = snprintf(req_buf, 2048, ...);
+        rlen = snprintf(req_buf, req_cap, ...);
     }
+    if (rlen < 0 || (size_t)rlen >= req_cap) {
+        free(req_buf);
+        proxy_conn_close(conn);
+        return ESP_ERR_INVALID_ARG;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_http_request.c` around lines 240 - 266, The snprintf calls
building the HTTP request can report a larger length than the actual buffer and
rlen is being used directly for the subsequent write, risking truncated writes;
update the logic around req_buf, rlen, and proxy_conn_write to (1) compute and
use a single buffer_size variable (e.g. buf_size = 2048 + body_len), (2) check
snprintf's return: if it returns a negative value treat as error; if it returns
>= buf_size, clamp rlen to buf_size - 1 (the actual bytes written) before
calling proxy_conn_write, and (3) ensure any write uses the actual bytes placed
in req_buf (not the full would-be length), applying this to both branches
(body_len > 0 and body_len == 0) and referencing req_buf, rlen, body_len,
hdr_extra, snprintf, and proxy_conn_write.

325-332: ⚠️ Potential issue | 🔴 Critical

Output buffer not validated for null or zero size.

If output is NULL or output_size is 0, the snprintf calls will cause undefined behavior. The function should validate these parameters upfront.

Suggested fix
 esp_err_t tool_http_request_execute(const char *input_json, char *output, size_t output_size)
 {
+    if (!output || output_size == 0) {
+        return ESP_ERR_INVALID_ARG;
+    }
+    output[0] = '\0';
+
     /* Parse input */
     cJSON *input = cJSON_Parse(input_json);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_http_request.c` around lines 325 - 332, The function
tool_http_request_execute must validate the output buffer before any snprintf
use: at the start of tool_http_request_execute (or immediately after
cJSON_Parse) check that output is not NULL and output_size > 0 and if the check
fails return ESP_ERR_INVALID_ARG (and free any allocated cJSON *input if already
created). Replace direct snprintf calls assuming valid buffer with code paths
that only call snprintf after this validation to avoid undefined behavior.

179-181: ⚠️ Potential issue | 🟠 Major

Port parsing with atoi doesn't reject malformed or out-of-range values.

atoi(colon + 1) silently returns 0 for non-numeric input and doesn't detect overflow. Invalid ports (0, negative, >65535) will proceed without error.

Suggested fix using strtol with validation
     if (colon && (!slash || colon < slash)) {
         hlen = colon - p;
-        *port = atoi(colon + 1);
+        char *endptr = NULL;
+        long parsed = strtol(colon + 1, &endptr, 10);
+        if (endptr == colon + 1 || 
+            (slash ? endptr != slash : (*endptr != '\0' && *endptr != '/')) ||
+            parsed < 1 || parsed > 65535) {
+            return ESP_ERR_INVALID_ARG;
+        }
+        *port = (int)parsed;
     } else if (slash) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_http_request.c` around lines 179 - 181, The port parsing
currently uses atoi(colon + 1) which accepts malformed or out-of-range values;
replace this with strtol and validate the result: call strtol(colon + 1,
&endptr, 10), ensure endptr points to the end of the numeric substring (i.e., no
non-digit characters before the next delimiter), check errno for ERANGE, and
verify the resulting long is within 1..65535 before assigning to *port; if
validation fails, handle it as an error (reject/return) instead of accepting 0
or an invalid port. Use the existing variables (p, colon, slash, hlen, port) to
locate the parsing logic in this block.

222-232: ⚠️ Potential issue | 🔴 Critical

hdr_off can overflow hdr_extra buffer with many/long headers.

The loop accumulates header data without checking if hdr_off has exceeded the buffer capacity. snprintf will return the number of bytes that would have been written, causing hdr_off to grow past sizeof(hdr_extra) and subsequent appends to use invalid pointer/size pairs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_http_request.c` around lines 222 - 232, The loop building
hdr_extra can overflow because hdr_off is incremented by snprintf's return value
without bounds checking; fix the loop in the cJSON_ArrayForEach over headers by
computing size_t remaining = sizeof(hdr_extra) - hdr_off and only calling
snprintf with that remaining size, capture its return (int n), then if n < 0
break, if (size_t)n >= remaining set hdr_off = sizeof(hdr_extra) - 1 and break
(to avoid further writes), otherwise hdr_off += (size_t)n; ensure hdr_extra
remains null-terminated; update the block around hdr_extra/hdr_off and the
cJSON_ArrayForEach to perform these checks.

424-433: ⚠️ Potential issue | 🔴 Critical

remaining underflows if snprintf output exceeds buffer size.

If off >= output_size, the subtraction output_size - off - 1 underflows (since size_t is unsigned), causing a massive remaining value and potential buffer overread in memcpy.

Suggested fix with bounds checking
     /* Format output */
-    size_t off = snprintf(output, output_size, "Status: %d\n\n", status);
+    int prefix = snprintf(output, output_size, "Status: %d\n\n", status);
+    if (prefix < 0 || (size_t)prefix >= output_size) {
+        free(hb.data);
+        return ESP_ERR_INVALID_ARG;
+    }
+    size_t off = (size_t)prefix;

     /* Copy response body, truncate if needed */
     size_t remaining = output_size - off - 1;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_http_request.c` around lines 424 - 433, The snprintf result
stored in off can be >= output_size causing underflow in the subtraction for
remaining; clamp and handle snprintf's return: capture snprintf's (or
vsnprintf's) return value, convert to size_t safely and set off = (off_ret >=
output_size) ? (output_size - 1) : off_ret, ensure output[off] = '\0' (or
output[output_size-1]='\0' if truncated), then compute remaining as (off <
output_size - 1) ? (output_size - off - 1) : 0 before computing body_copy and
calling memcpy with hb.len/hb.data; update logic around variables off,
remaining, body_copy, output, output_size, hb.len and hb.data accordingly.

92-98: ⚠️ Potential issue | 🟡 Minor

Accumulator drops bytes at exact boundary and skips partial chunk copy.

The condition needed + 1 < hb->cap discards data when it exactly fits. Additionally, when a chunk would overflow, the entire chunk is dropped instead of copying as much as possible.

Suggested fix
     if (evt->event_id == HTTP_EVENT_ON_DATA) {
-        size_t needed = hb->len + evt->data_len;
-        if (needed + 1 < hb->cap) {
-            memcpy(hb->data + hb->len, evt->data, evt->data_len);
-            hb->len += evt->data_len;
+        if (hb->len < hb->cap - 1) {
+            size_t avail = hb->cap - hb->len - 1;
+            size_t copy = (size_t)evt->data_len < avail ? (size_t)evt->data_len : avail;
+            memcpy(hb->data + hb->len, evt->data, copy);
+            hb->len += copy;
             hb->data[hb->len] = '\0';
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_http_request.c` around lines 92 - 98, The accumulator drops
bytes at the exact boundary and drops entire over-sized chunks; in the
HTTP_EVENT_ON_DATA branch update the capacity check and partial-copy logic:
change the condition using hb->len / hb->cap to allow exact fit (use <= for
needed + 1 vs hb->cap), and when evt->data_len would overflow, copy only the
number of bytes that fit (size_t can_copy = hb->cap - 1 - hb->len), memcpy that
many from evt->data into hb->data, advance hb->len and set hb->data[hb->len] =
'\0'; reference symbols: HTTP_EVENT_ON_DATA, evt->data_len, hb->len, hb->cap,
hb->data (and consider growing/reallocating hb if full instead of truncating).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/tools/tool_http_request.c`:
- Around line 50-79: The is_blocked_destination function currently resolves the
host only to check IPv4 and then the caller re-resolves (TOCTOU) and it also
ignores IPv6; change it to use getaddrinfo with hints.ai_family = AF_UNSPEC,
iterate the entire res linked list and check every ai_family (AF_INET and
AF_INET6) and corresponding ai_addr entries (convert IPv4 with ntohl on
sockaddr_in and add IPv6 check using the in6_addr bytes or a new
isAlwaysBlocked6 helper), and instead of just returning true/false keep/return
the resolved sockaddr (or add an API variant like resolve_and_check_destination
that returns the first safe resolved addr) so the caller can use that sockaddr
for connect() and avoid a second DNS lookup; also ensure you freeaddrinfo(res)
and propagate errors for connection-time validation as defense-in-depth.
- Around line 150-154: The code currently reads the HTTP status code
unconditionally even when esp_http_client_perform(client) fails; initialize
*out_status = 0 before calling esp_http_client_perform and only call
esp_http_client_get_status_code(client) to update *out_status when
esp_http_client_perform returns ESP_OK (or indicates success). Keep
esp_http_client_cleanup(client) executed regardless of outcome, and return the
original err so callers get the error and a safe zeroed status when the request
failed; locate symbols esp_http_client_perform, esp_http_client_get_status_code,
esp_http_client_cleanup, and out_status to apply this change.
- Around line 358-371: The SSRF protection currently skips checking when hlen ==
0 or hlen >= sizeof(host_buf); update the validation around hlen/host_buf so any
invalid host length (hlen == 0 || hlen >= sizeof(host_buf)) is treated as an
error: log a warning via ESP_LOGW including the raw host slice if available,
free input with cJSON_Delete(input), write an explanatory message into output
with snprintf, and return ESP_ERR_INVALID_ARG instead of falling through; keep
the existing blocked-path behavior using is_blocked_destination(host_buf) when
the host length is valid and the host is copied into host_buf.

---

Duplicate comments:
In `@main/tools/tool_http_request.c`:
- Around line 416-437: The variable method points into the cJSON structure input
and is freed by cJSON_Delete(input), so copy method into a local buffer before
calling cJSON_Delete to avoid a dangling pointer; e.g., allocate or
stack-allocate a small buffer (e.g., method_copy) and copy the string (using
strncpy/strdup/snprintf semantics), then call cJSON_Delete(input), use
method_copy in the ESP_LOGI call, and free method_copy if it was heap-allocated;
ensure references to method in the function (notably the ESP_LOGI(TAG, "HTTP %s
complete...") call) are replaced with the safe copy.
- Around line 240-266: The snprintf calls building the HTTP request can report a
larger length than the actual buffer and rlen is being used directly for the
subsequent write, risking truncated writes; update the logic around req_buf,
rlen, and proxy_conn_write to (1) compute and use a single buffer_size variable
(e.g. buf_size = 2048 + body_len), (2) check snprintf's return: if it returns a
negative value treat as error; if it returns >= buf_size, clamp rlen to buf_size
- 1 (the actual bytes written) before calling proxy_conn_write, and (3) ensure
any write uses the actual bytes placed in req_buf (not the full would-be
length), applying this to both branches (body_len > 0 and body_len == 0) and
referencing req_buf, rlen, body_len, hdr_extra, snprintf, and proxy_conn_write.
- Around line 325-332: The function tool_http_request_execute must validate the
output buffer before any snprintf use: at the start of tool_http_request_execute
(or immediately after cJSON_Parse) check that output is not NULL and output_size
> 0 and if the check fails return ESP_ERR_INVALID_ARG (and free any allocated
cJSON *input if already created). Replace direct snprintf calls assuming valid
buffer with code paths that only call snprintf after this validation to avoid
undefined behavior.
- Around line 179-181: The port parsing currently uses atoi(colon + 1) which
accepts malformed or out-of-range values; replace this with strtol and validate
the result: call strtol(colon + 1, &endptr, 10), ensure endptr points to the end
of the numeric substring (i.e., no non-digit characters before the next
delimiter), check errno for ERANGE, and verify the resulting long is within
1..65535 before assigning to *port; if validation fails, handle it as an error
(reject/return) instead of accepting 0 or an invalid port. Use the existing
variables (p, colon, slash, hlen, port) to locate the parsing logic in this
block.
- Around line 222-232: The loop building hdr_extra can overflow because hdr_off
is incremented by snprintf's return value without bounds checking; fix the loop
in the cJSON_ArrayForEach over headers by computing size_t remaining =
sizeof(hdr_extra) - hdr_off and only calling snprintf with that remaining size,
capture its return (int n), then if n < 0 break, if (size_t)n >= remaining set
hdr_off = sizeof(hdr_extra) - 1 and break (to avoid further writes), otherwise
hdr_off += (size_t)n; ensure hdr_extra remains null-terminated; update the block
around hdr_extra/hdr_off and the cJSON_ArrayForEach to perform these checks.
- Around line 424-433: The snprintf result stored in off can be >= output_size
causing underflow in the subtraction for remaining; clamp and handle snprintf's
return: capture snprintf's (or vsnprintf's) return value, convert to size_t
safely and set off = (off_ret >= output_size) ? (output_size - 1) : off_ret,
ensure output[off] = '\0' (or output[output_size-1]='\0' if truncated), then
compute remaining as (off < output_size - 1) ? (output_size - off - 1) : 0
before computing body_copy and calling memcpy with hb.len/hb.data; update logic
around variables off, remaining, body_copy, output, output_size, hb.len and
hb.data accordingly.
- Around line 92-98: The accumulator drops bytes at the exact boundary and drops
entire over-sized chunks; in the HTTP_EVENT_ON_DATA branch update the capacity
check and partial-copy logic: change the condition using hb->len / hb->cap to
allow exact fit (use <= for needed + 1 vs hb->cap), and when evt->data_len would
overflow, copy only the number of bytes that fit (size_t can_copy = hb->cap - 1
- hb->len), memcpy that many from evt->data into hb->data, advance hb->len and
set hb->data[hb->len] = '\0'; reference symbols: HTTP_EVENT_ON_DATA,
evt->data_len, hb->len, hb->cap, hb->data (and consider growing/reallocating hb
if full instead of truncating).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 007f5d1 and 0c2794a.

📒 Files selected for processing (1)
  • main/tools/tool_http_request.c

@wjc1207 wjc1207 force-pushed the feat/arxiv branch 4 times, most recently from 0c2794a to 48cebf2 Compare March 1, 2026 11:31
@wjc1207 wjc1207 changed the title Feat: add arixv search tool Feat: add http request tool Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request P2 Priority: Tier 3 - Core functionality extensions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants