Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new HTTP request tool (implementation + header), registers it in the tool registry and build, introduces an arxiv-search built-in skill that uses the HTTP tool, and updates .gitignore. Changes
Sequence Diagram(s)sequenceDiagram
participant Skill as ArXiv Skill
participant HTTPTool as HTTP Request Tool
participant Proxy as Proxy Service
participant HTTPClient as HTTP Client
participant Server as Remote Server
Skill->>HTTPTool: call tool_http_request_execute(input JSON)
HTTPTool->>HTTPTool: parse JSON (url, method, headers, body)
HTTPTool->>HTTPTool: validate method, build request
alt proxy enabled
HTTPTool->>Proxy: send proxied request
Proxy->>HTTPClient: forward request
else direct
HTTPTool->>HTTPClient: send direct request
end
HTTPClient->>Server: HTTP request
Server-->>HTTPClient: response (status, headers, body)
HTTPClient-->>HTTPTool: deliver response
HTTPTool->>HTTPTool: strip headers, format "Status: <code>\n\n<body>"
HTTPTool-->>Skill: return formatted response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
main/tools/tool_arxiv.c (1)
55-73: Maketool_arxiv_init()idempotent to avoid leaks on re-init.If initialization is called more than once, mutex/cache allocations can be replaced without cleanup.
♻️ Suggested guard
esp_err_t tool_arxiv_init(void) { + if (s_mutex && s_cache_val) { + return ESP_OK; + } + s_mutex = xSemaphoreCreateMutex();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_arxiv.c` around lines 55 - 73, tool_arxiv_init currently assumes a fresh start and will leak resources if called multiple times; make it idempotent by checking and cleaning existing resources before re-allocating: if s_mutex or s_cache_val are non-NULL at entry, free them (vSemaphoreDelete(s_mutex) and heap_caps_free(s_cache_val)) and set to NULL, or return early if already initialized depending on desired semantics, then proceed to create the mutex (xSemaphoreCreateMutex) and allocate s_cache_val (heap_caps_calloc) as before; ensure error paths still clean up partially-created resources (delete mutex if calloc fails) and leave global handles in a consistent state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/tools/tool_arxiv.c`:
- Around line 42-49: The handler for HTTP_EVENT_ON_DATA currently drops the
whole chunk when there isn't room (in the if using needed < ab->cap - 1); change
it to copy as much as fits: compute size_t available = ab->cap - 1 - ab->len,
then size_t to_copy = MIN(evt->data_len, available), memcpy(ab->data + ab->len,
evt->data, to_copy), increment ab->len by to_copy and set ab->data[ab->len] =
'\0' (and only skip copying when available == 0); use the existing variables
evt->data_len, evt->data, ab->len, ab->cap and memcpy to locate the change.
- Around line 322-325: The loop that reads from proxy_conn_read(conn, tmp,
sizeof(tmp), 15000) wrongly treats n == 0 as EOF; change the logic so only
negative returns indicate error/EOF while n == 0 is treated as a
transient/read-would-block and handled by continuing (or retrying) rather than
breaking. Concretely, in the while(1) loop around proxy_conn_read, preserve
handling for n < 0 as failure, handle n == 0 by continuing (optionally after a
short backoff) and track a small consecutive-zero-read counter or elapsed time
to avoid infinite loops, and only stop reading when either data is received (n >
0), a real error occurs (n < 0), or a retry timeout/threshold is exceeded; refer
to proxy_conn_read, conn, tmp, total and ab->cap to locate and update the code.
- Around line 455-465: The cache is being populated from the caller-provided
output buffer (bounded by output_size), so a small first caller can store a
truncated result; instead, produce the canonical full-formatted result into a
cache-sized temporary buffer and use that to update s_cache_val while still
copying a size-limited portion into the caller's output. Concretely: call
format_results(ab.data, temp_cache_buf, ARXIV_CACHE_VAL_SIZE) (using ab.data and
ARXIV_CACHE_VAL_SIZE), then under s_mutex copy temp_cache_buf into s_cache_val
and strncpy/snprintf the truncated version into output respecting output_size;
keep using s_cache_key and ARXIV_CACHE_KEY_LEN as before and free ab.data as
currently done.
- Around line 404-406: The cache key construction using snprintf into cache_key
(size ARXIV_CACHE_KEY_LEN) can truncate long inputs (keywords_j->valuestring)
and cause incorrect cache hits; update the code around the snprintf call to
check snprintf's return value and handle truncation: if snprintf indicates
truncation, generate a non-truncated unique key (e.g., compute a fixed-size hash
such as SHA256/MD5 of the full keywords_j->valuestring combined with start and
max_results) or allocate a sufficiently large buffer dynamically, then use that
full/hashed key for cache lookups; ensure the logic around cache_key,
ARXIV_CACHE_KEY_LEN, keywords_j, start, and max_results consistently uses the
non-truncated identifier so distinct long queries do not collide.
- Around line 303-313: The header buffer is too small: replace the fixed char
header[512] used with snprintf in the proxy request block by allocating a buffer
sized from strlen(path) + room for the constant header text (or use a safe upper
bound like 1024/2048), then call snprintf into that dynamic buffer, check the
snprintf return (as currently done) against the allocated size, and on error
free the buffer before calling proxy_conn_close(conn) and returning ESP_FAIL;
update the code around the header declaration and the snprintf(...) call that
references path to use the new buffer and ensure proper free on all error paths.
In `@main/tools/tool_registry.c`:
- Around line 180-196: Check the return value of tool_arxiv_init() and fail
fast: if tool_arxiv_init() indicates failure (e.g., returns NULL/false/error
code), log or report the error and do not proceed to construct/register the
mimi_tool_t arxiv or call register_tool(&arxiv); ensure the code path returns an
error status (or exits) instead of continuing, and keep references to
tool_arxiv_execute and the mimi_tool_t arxiv structure so the happy path remains
unchanged when initialization succeeds.
- Around line 179-194: Add the missing header that declares tool_arxiv_init and
tool_arxiv_execute to the top of this file so the compiler sees their
prototypes; specifically include the appropriate arXiv tool header (e.g., the
header that defines tool_arxiv_init/tool_arxiv_execute, such as "tool_arxiv.h")
and then rebuild to verify the unresolved symbol errors are gone.
---
Nitpick comments:
In `@main/tools/tool_arxiv.c`:
- Around line 55-73: tool_arxiv_init currently assumes a fresh start and will
leak resources if called multiple times; make it idempotent by checking and
cleaning existing resources before re-allocating: if s_mutex or s_cache_val are
non-NULL at entry, free them (vSemaphoreDelete(s_mutex) and
heap_caps_free(s_cache_val)) and set to NULL, or return early if already
initialized depending on desired semantics, then proceed to create the mutex
(xSemaphoreCreateMutex) and allocate s_cache_val (heap_caps_calloc) as before;
ensure error paths still clean up partially-created resources (delete mutex if
calloc fails) and leave global handles in a consistent state.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
main/CMakeLists.txtmain/tools/tool_arxiv.cmain/tools/tool_arxiv.hmain/tools/tool_registry.c
There was a problem hiding this comment.
♻️ Duplicate comments (1)
main/tools/tool_registry.c (1)
181-196:⚠️ Potential issue | 🟠 MajorPropagate arXiv init failures before registering the tool.
tool_registry_init()currently proceeds to registerarxiv_searchand still returns success even if arXiv init cannot complete. Please gate registration on init success (or updatetool_arxiv_initto returnesp_err_tso failures can be propagated).🔧 Proposed fix (registry side)
- tool_arxiv_init(); + esp_err_t arxiv_err = tool_arxiv_init(); + if (arxiv_err != ESP_OK) { + ESP_LOGE(TAG, "Failed to init arxiv tool: %s", esp_err_to_name(arxiv_err)); + return arxiv_err; + }#!/bin/bash # Verify init signature and whether init has failure paths that should be propagated. rg -nP 'tool_arxiv_init\s*\(' main/tools/tool_arxiv.h main/tools/tool_arxiv.c main/tools/tool_registry.c -C2 rg -nP '(ESP_FAIL|ESP_ERR_|return\s+ESP_|malloc|calloc|xSemaphoreCreate|esp_http_client_init)' main/tools/tool_arxiv.c -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_registry.c` around lines 181 - 196, tool_registry_init currently calls tool_arxiv_init() but still registers the arxiv_search tool even if init fails; change the flow so registration is gated on successful initialization: modify tool_arxiv_init to return esp_err_t (or otherwise expose an init success boolean) and in tool_registry_init call tool_arxiv_init(), check for ESP_OK (or the boolean) and only call register_tool(&arxiv) and proceed when init succeeded—if init fails, return the error from tool_registry_init instead of registering; reference tool_arxiv_init, tool_registry_init, register_tool, and the mimi_tool_t arxiv/execute = tool_arxiv_execute symbols to locate relevant code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@main/tools/tool_registry.c`:
- Around line 181-196: tool_registry_init currently calls tool_arxiv_init() but
still registers the arxiv_search tool even if init fails; change the flow so
registration is gated on successful initialization: modify tool_arxiv_init to
return esp_err_t (or otherwise expose an init success boolean) and in
tool_registry_init call tool_arxiv_init(), check for ESP_OK (or the boolean) and
only call register_tool(&arxiv) and proceed when init succeeded—if init fails,
return the error from tool_registry_init instead of registering; reference
tool_arxiv_init, tool_registry_init, register_tool, and the mimi_tool_t
arxiv/execute = tool_arxiv_execute symbols to locate relevant code.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
main/tools/tool_arxiv.cmain/tools/tool_registry.c
🚧 Files skipped from review as they are similar to previous changes (1)
- main/tools/tool_arxiv.c
There was a problem hiding this comment.
♻️ Duplicate comments (4)
main/tools/tool_arxiv.c (3)
411-413:⚠️ Potential issue | 🟠 MajorCache key truncation can return wrong cached results.
The cache key is built without checking for truncation. If two distinct long keyword queries truncate to the same 256-byte prefix, they'll incorrectly share cached results. Consider checking
snprintf's return value and skipping cache operations when truncation occurs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_arxiv.c` around lines 411 - 413, The cache key construction using snprintf into cache_key (size ARXIV_CACHE_KEY_LEN) can truncate long keyword strings and cause collisions; update the logic in the code that builds cache_key to check snprintf's return value (or use vsnprintf) and detect if the generated length >= ARXIV_CACHE_KEY_LEN, and in that case avoid using the cache for this query (i.e., skip cache lookup/store) or allocate a sufficiently large buffer instead; ensure the change touches the snprintf call that formats "%s|%d|%d" with keywords_j->valuestring, start, max_results so truncation is detected and handled.
304-314:⚠️ Potential issue | 🟠 MajorProxy request header buffer is too small for valid paths.
The
pathparameter can be up to 640 bytes (see Line 431), but the header buffer is only 512 bytes. Combined with the HTTP header template (~80 bytes), this will cause truncation for longer keyword queries, resulting in malformed proxy requests.Consider dynamically allocating the header buffer based on
strlen(path) + 256or increasing the fixed size to at least 1024 bytes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_arxiv.c` around lines 304 - 314, The fixed-size header buffer "header" is too small for allowed path lengths and can truncate the formatted request; change the code that builds the request (the snprintf into header and the hlen check) to allocate the buffer dynamically based on strlen(path) + 256 (or at least 1024) instead of the 512-byte stack buffer: compute size = strlen(path) + 256, malloc a buffer for "header", use snprintf into that buffer, retain the existing hlen < 0 / hlen >= size check and on error call proxy_conn_close(conn), free the buffer on all return paths, and replace any references to the stack variable with the dynamically allocated buffer.
462-472:⚠️ Potential issue | 🟠 MajorCache stores caller-truncated output.
The cache is populated from the caller's
outputbuffer, which is bounded byoutput_size. If the first caller has a small buffer, the truncated result is cached and returned to subsequent callers who may have larger buffers.Format results into a cache-sized temporary buffer first, then copy to both cache and caller's output.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_arxiv.c` around lines 462 - 472, The current code fills the caller's output buffer (output, output_size) then stores that possibly truncated buffer into s_cache_val, causing small-buffer callers to pollute the cache; instead, call format_results into a temporary buffer sized ARXIV_CACHE_VAL_SIZE (e.g. char tmp[ARXIV_CACHE_VAL_SIZE]), then free ab.data, log length from tmp, take s_mutex and strncpy tmp into s_cache_val (ensuring NUL termination) and copy/truncate tmp into the caller's output using output_size (ensuring NUL termination) before giving s_mutex; reference functions/variables: format_results, ab.data, s_cache_key, s_cache_val, ARXIV_CACHE_VAL_SIZE, output/output_size, xSemaphoreTake/xSemaphoreGive, strncpy.main/tools/tool_registry.c (1)
180-181:⚠️ Potential issue | 🟠 MajorCheck
tool_arxiv_init()return value before registering the tool.The initialization can fail (e.g., PSRAM allocation failure), but the return value is ignored. If init fails, the tool is still registered and will return
ESP_ERR_INVALID_STATEat runtime, which is a poor user experience.🔧 Proposed fix
/* Register arxiv_search */ - tool_arxiv_init(); + esp_err_t arxiv_err = tool_arxiv_init(); + if (arxiv_err != ESP_OK) { + ESP_LOGE(TAG, "Failed to init arxiv tool: %s", esp_err_to_name(arxiv_err)); + /* Continue without arxiv tool rather than failing entire registry */ + } - mimi_tool_t arxiv = { + if (arxiv_err == ESP_OK) { + mimi_tool_t arxiv = { .name = "arxiv_search", .description = "Search arXiv for academic papers by keywords. Returns title, authors, summary, and link for each result.", .input_schema_json = "{\"type\":\"object\"," "\"properties\":{" "\"keywords\":{\"type\":\"string\",\"description\":\"Search keywords or phrase\"}," "\"start\":{\"type\":\"integer\",\"description\":\"Pagination offset (default 0)\"}," "\"max_results\":{\"type\":\"integer\",\"description\":\"Number of results to return (1-20, default 5)\"}" "}," "\"required\":[\"keywords\"]}", .execute = tool_arxiv_execute, - }; - register_tool(&arxiv); + }; + register_tool(&arxiv); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_registry.c` around lines 180 - 181, Call and check the return value of tool_arxiv_init() (assign to an esp_err_t like ret) and only proceed with registering the arXiv tool when ret == ESP_OK; on failure log an error including ret and avoid registering the tool (or return the error from the enclosing init function) so the tool isn't left in an invalid state. Use the symbol tool_arxiv_init to locate the call site and handle non-ESP_OK values (e.g., ESP_ERR_NO_MEM / ESP_ERR_INVALID_STATE) by skipping registration and performing any necessary cleanup/reporting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@main/tools/tool_arxiv.c`:
- Around line 411-413: The cache key construction using snprintf into cache_key
(size ARXIV_CACHE_KEY_LEN) can truncate long keyword strings and cause
collisions; update the logic in the code that builds cache_key to check
snprintf's return value (or use vsnprintf) and detect if the generated length >=
ARXIV_CACHE_KEY_LEN, and in that case avoid using the cache for this query
(i.e., skip cache lookup/store) or allocate a sufficiently large buffer instead;
ensure the change touches the snprintf call that formats "%s|%d|%d" with
keywords_j->valuestring, start, max_results so truncation is detected and
handled.
- Around line 304-314: The fixed-size header buffer "header" is too small for
allowed path lengths and can truncate the formatted request; change the code
that builds the request (the snprintf into header and the hlen check) to
allocate the buffer dynamically based on strlen(path) + 256 (or at least 1024)
instead of the 512-byte stack buffer: compute size = strlen(path) + 256, malloc
a buffer for "header", use snprintf into that buffer, retain the existing hlen <
0 / hlen >= size check and on error call proxy_conn_close(conn), free the buffer
on all return paths, and replace any references to the stack variable with the
dynamically allocated buffer.
- Around line 462-472: The current code fills the caller's output buffer
(output, output_size) then stores that possibly truncated buffer into
s_cache_val, causing small-buffer callers to pollute the cache; instead, call
format_results into a temporary buffer sized ARXIV_CACHE_VAL_SIZE (e.g. char
tmp[ARXIV_CACHE_VAL_SIZE]), then free ab.data, log length from tmp, take s_mutex
and strncpy tmp into s_cache_val (ensuring NUL termination) and copy/truncate
tmp into the caller's output using output_size (ensuring NUL termination) before
giving s_mutex; reference functions/variables: format_results, ab.data,
s_cache_key, s_cache_val, ARXIV_CACHE_VAL_SIZE, output/output_size,
xSemaphoreTake/xSemaphoreGive, strncpy.
In `@main/tools/tool_registry.c`:
- Around line 180-181: Call and check the return value of tool_arxiv_init()
(assign to an esp_err_t like ret) and only proceed with registering the arXiv
tool when ret == ESP_OK; on failure log an error including ret and avoid
registering the tool (or return the error from the enclosing init function) so
the tool isn't left in an invalid state. Use the symbol tool_arxiv_init to
locate the call site and handle non-ESP_OK values (e.g., ESP_ERR_NO_MEM /
ESP_ERR_INVALID_STATE) by skipping registration and performing any necessary
cleanup/reporting.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
main/CMakeLists.txtmain/tools/tool_arxiv.cmain/tools/tool_arxiv.hmain/tools/tool_registry.c
|
Thanks for your contribution! I will check it :) |
Yes, I perfer to add some markdown files to achieve this target. @crispyberry |
@crispyberry I agree with you, the arxiv search should be a skill that works on specific task. I'll modify it soon. |
|
initial plan:
|
|
Hi wang would u like to connect with me via my gmail ssslvky@gmail.com? Just send hi to me |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/skills/skill_loader.c`:
- Around line 67-73: Update the multi-line guidance in skill_loader.c that
explains how to build the ArXiv API query URL (the block starting with "1.
Identify the search keywords..." and "2. Build the ArXiv API query URL:") to add
an explicit step stating that all search terms must be percent-encoded per RFC
3986 (e.g., `(` -> `%28`, `"` -> `%22`, `&` -> `%26`, literal `+` -> `%2B`)
before joining with `+AND+`, and note that reserved characters and special
punctuation must be encoded in query parameter values in addition to encoding
spaces as `+`.
In `@main/tools/tool_http_request.c`:
- Around line 262-268: The code in tool_http_request_execute (and the similar
block at 339-347) uses snprintf/remaining arithmetic without validating output
and output_size, which can underflow and cause OOB writes; first validate that
output is non-NULL and output_size > 0 at the top of the function and if invalid
return an error (and if output_size>0 set output[0] = '\0' to keep it sane),
then replace the current snprintf/remaining logic with robust length checks:
call snprintf with output_size and capture its return (int ret), clamp off = ret
< (int)output_size ? ret : (int)output_size - 1, compute remaining only when
output_size > (size_t)off + 1, and use strncat/strncpy or memcpy guarded by
remaining to avoid copying more bytes than remaining; apply the same validation
and fixes to the other block referenced (lines ~339-347) so no underflow or OOB
copy can occur.
- Around line 116-119: The port parsing branch that currently computes hlen and
sets *port = atoi(colon + 1) must validate the port string and reject malformed
or out-of-range values rather than treating them as 0; replace the naive atoi
usage with proper parsing (e.g., use strtol or equivalent) to check that the
characters after colon are all digits, detect conversion errors, and ensure the
numeric value is within the valid TCP/UDP port range (1–65535); on any parse or
range failure return an error/fail fast from the surrounding function instead of
proceeding with *port == 0 and keep the existing logic that sets hlen and
handles the slash case when parsing succeeds.
- Around line 29-35: The HTTP_EVENT_ON_DATA handler currently discards incoming
bytes when they exactly fill the remaining buffer or when a chunk overflows,
because the condition uses "needed + 1 < hb->cap" and it never copies partial
overflow; change the check to allow exact-fit by using "<= hb->cap - 1" (or
equivalently "needed + 1 <= hb->cap") and, if the chunk still overflows, copy
only the available bytes: compute size_t avail = hb->cap - hb->len - 1,
memcpy(avail) from evt->data into hb->data + hb->len, update hb->len by avail
and null-terminate hb->data; keep dropping any remaining bytes beyond avail (or
log/handle as needed). Ensure you reference the HTTP_EVENT_ON_DATA branch and
fields hb->len, hb->cap, hb->data and evt->data_len/evt->data when making these
changes.
- Around line 171-206: The formatted request may be truncated so check
snprintf's return (rlen) against the actual allocated buffer size (the capacity
passed to snprintf, i.e., 2048 + body_len or 2048) before calling
proxy_conn_write; if rlen < 0 or rlen >= capacity treat it as an error (free
req_buf and return an appropriate error) to avoid passing a length that exceeds
req_buf capacity. Locate the block that allocates req_buf (heap_caps_calloc),
the two snprintf calls that set rlen, and the proxy_conn_write call and add a
validation branch after each snprintf to handle truncation/format errors safely.
- Around line 330-351: The log uses method after cJSON_Delete(input) which frees
the JSON backing memory so method is dangling; fix by copying the method string
to an owned buffer (e.g., strdup or strncpy into a local buffer) before calling
cJSON_Delete(input), use that copied string in ESP_LOGI(TAG, "HTTP %s complete,
...", method), and free the copied buffer after the log (ensure hb.data is freed
at the correct time as well).
- Around line 159-167: The code building hdr_extra with hdr_off can overflow the
1024 buffer when many/long headers are appended; modify the loop that iterates
headers (cJSON_ArrayForEach over headers) to compute remaining =
sizeof(hdr_extra) - hdr_off, skip or truncate the header when remaining <= 1,
call snprintf with remaining as the max size, capture snprintf's return and if
it returns >= remaining clamp hdr_off = sizeof(hdr_extra)-1 and break (or stop
appending), ensuring hdr_off never exceeds the buffer and hdr_extra remains
NUL-terminated; alternatively replace the fixed buffer with a dynamically
growing buffer when more space is needed.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.gitignoremain/CMakeLists.txtmain/skills/skill_loader.cmain/tools/tool_http_request.cmain/tools/tool_http_request.hmain/tools/tool_registry.c
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
- main/tools/tool_registry.c
- main/CMakeLists.txt
IRONICBo
left a comment
There was a problem hiding this comment.
The tool_http_request implementation is solid — good proxy support, proper input validation, PSRAM allocation. However, this is a general-purpose HTTP request tool exposed to the LLM with no URL allowlist or rate limiting. On a public Telegram bot this is an SSRF vector — the LLM (or a prompt injection) can hit internal network addresses, cloud metadata endpoints (169.254.169.254), etc.
At minimum, block RFC-1918 and link-local IPs, or scope the tool to specific domains (e.g. export.arxiv.org only). Also consider whether the arxiv skill alone (using the existing web_search tool) would be sufficient without adding a raw HTTP tool.
|
@IRONICBo Open for discussion. I think we should keep the we can add require_first_visit_approval |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (7)
main/tools/tool_http_request.c (7)
416-437:⚠️ Potential issue | 🔴 Critical
methodis used after freeing its backing JSON object.
methodpoints into theinputcJSON structure. AftercJSON_Delete(input)on line 416,methodbecomes a dangling pointer, but it's used in the log on line 437.Suggested fix: copy method before deletion
/* Method (default: GET) */ - const char *method = "GET"; + char method_buf[8] = "GET"; cJSON *method_item = cJSON_GetObjectItem(input, "method"); if (method_item && cJSON_IsString(method_item) && method_item->valuestring[0]) { - method = method_item->valuestring; + strncpy(method_buf, method_item->valuestring, sizeof(method_buf) - 1); + method_buf[sizeof(method_buf) - 1] = '\0'; } + const char *method = method_buf;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_http_request.c` around lines 416 - 437, The variable method points into the cJSON structure input and is freed by cJSON_Delete(input), so copy method into a local buffer before calling cJSON_Delete to avoid a dangling pointer; e.g., allocate or stack-allocate a small buffer (e.g., method_copy) and copy the string (using strncpy/strdup/snprintf semantics), then call cJSON_Delete(input), use method_copy in the ESP_LOGI call, and free method_copy if it was heap-allocated; ensure references to method in the function (notably the ESP_LOGI(TAG, "HTTP %s complete...") call) are replaced with the safe copy.
240-266:⚠️ Potential issue | 🔴 CriticalRequest buffer capacity mismatch can cause truncated write with wrong length.
Two issues:
- When
body_len > 0,rlenfromsnprintfcould exceed2048 + body_lenif truncation occurs, but that value is passed toproxy_conn_write.- When
body_len == 0, the buffer is still allocated as2048 + body_len(= 2048), butsnprintfuses size2048. Ifhdr_extrais large, the formatted request could be truncated, yetrlen(the would-be length) is written.Suggested fix: validate snprintf return
+ size_t req_cap = 2048 + body_len; int rlen; if (body_len > 0) { // ... snprintf ... - rlen = snprintf(req_buf, 2048 + body_len, ...); + rlen = snprintf(req_buf, req_cap, ...); } else { - rlen = snprintf(req_buf, 2048, ...); + rlen = snprintf(req_buf, req_cap, ...); } + if (rlen < 0 || (size_t)rlen >= req_cap) { + free(req_buf); + proxy_conn_close(conn); + return ESP_ERR_INVALID_ARG; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_http_request.c` around lines 240 - 266, The snprintf calls building the HTTP request can report a larger length than the actual buffer and rlen is being used directly for the subsequent write, risking truncated writes; update the logic around req_buf, rlen, and proxy_conn_write to (1) compute and use a single buffer_size variable (e.g. buf_size = 2048 + body_len), (2) check snprintf's return: if it returns a negative value treat as error; if it returns >= buf_size, clamp rlen to buf_size - 1 (the actual bytes written) before calling proxy_conn_write, and (3) ensure any write uses the actual bytes placed in req_buf (not the full would-be length), applying this to both branches (body_len > 0 and body_len == 0) and referencing req_buf, rlen, body_len, hdr_extra, snprintf, and proxy_conn_write.
325-332:⚠️ Potential issue | 🔴 CriticalOutput buffer not validated for null or zero size.
If
outputisNULLoroutput_sizeis 0, thesnprintfcalls will cause undefined behavior. The function should validate these parameters upfront.Suggested fix
esp_err_t tool_http_request_execute(const char *input_json, char *output, size_t output_size) { + if (!output || output_size == 0) { + return ESP_ERR_INVALID_ARG; + } + output[0] = '\0'; + /* Parse input */ cJSON *input = cJSON_Parse(input_json);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_http_request.c` around lines 325 - 332, The function tool_http_request_execute must validate the output buffer before any snprintf use: at the start of tool_http_request_execute (or immediately after cJSON_Parse) check that output is not NULL and output_size > 0 and if the check fails return ESP_ERR_INVALID_ARG (and free any allocated cJSON *input if already created). Replace direct snprintf calls assuming valid buffer with code paths that only call snprintf after this validation to avoid undefined behavior.
179-181:⚠️ Potential issue | 🟠 MajorPort parsing with
atoidoesn't reject malformed or out-of-range values.
atoi(colon + 1)silently returns 0 for non-numeric input and doesn't detect overflow. Invalid ports (0, negative, >65535) will proceed without error.Suggested fix using strtol with validation
if (colon && (!slash || colon < slash)) { hlen = colon - p; - *port = atoi(colon + 1); + char *endptr = NULL; + long parsed = strtol(colon + 1, &endptr, 10); + if (endptr == colon + 1 || + (slash ? endptr != slash : (*endptr != '\0' && *endptr != '/')) || + parsed < 1 || parsed > 65535) { + return ESP_ERR_INVALID_ARG; + } + *port = (int)parsed; } else if (slash) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_http_request.c` around lines 179 - 181, The port parsing currently uses atoi(colon + 1) which accepts malformed or out-of-range values; replace this with strtol and validate the result: call strtol(colon + 1, &endptr, 10), ensure endptr points to the end of the numeric substring (i.e., no non-digit characters before the next delimiter), check errno for ERANGE, and verify the resulting long is within 1..65535 before assigning to *port; if validation fails, handle it as an error (reject/return) instead of accepting 0 or an invalid port. Use the existing variables (p, colon, slash, hlen, port) to locate the parsing logic in this block.
222-232:⚠️ Potential issue | 🔴 Critical
hdr_offcan overflowhdr_extrabuffer with many/long headers.The loop accumulates header data without checking if
hdr_offhas exceeded the buffer capacity.snprintfwill return the number of bytes that would have been written, causinghdr_offto grow pastsizeof(hdr_extra)and subsequent appends to use invalid pointer/size pairs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_http_request.c` around lines 222 - 232, The loop building hdr_extra can overflow because hdr_off is incremented by snprintf's return value without bounds checking; fix the loop in the cJSON_ArrayForEach over headers by computing size_t remaining = sizeof(hdr_extra) - hdr_off and only calling snprintf with that remaining size, capture its return (int n), then if n < 0 break, if (size_t)n >= remaining set hdr_off = sizeof(hdr_extra) - 1 and break (to avoid further writes), otherwise hdr_off += (size_t)n; ensure hdr_extra remains null-terminated; update the block around hdr_extra/hdr_off and the cJSON_ArrayForEach to perform these checks.
424-433:⚠️ Potential issue | 🔴 Critical
remainingunderflows ifsnprintfoutput exceeds buffer size.If
off >= output_size, the subtractionoutput_size - off - 1underflows (sincesize_tis unsigned), causing a massiveremainingvalue and potential buffer overread inmemcpy.Suggested fix with bounds checking
/* Format output */ - size_t off = snprintf(output, output_size, "Status: %d\n\n", status); + int prefix = snprintf(output, output_size, "Status: %d\n\n", status); + if (prefix < 0 || (size_t)prefix >= output_size) { + free(hb.data); + return ESP_ERR_INVALID_ARG; + } + size_t off = (size_t)prefix; /* Copy response body, truncate if needed */ size_t remaining = output_size - off - 1;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_http_request.c` around lines 424 - 433, The snprintf result stored in off can be >= output_size causing underflow in the subtraction for remaining; clamp and handle snprintf's return: capture snprintf's (or vsnprintf's) return value, convert to size_t safely and set off = (off_ret >= output_size) ? (output_size - 1) : off_ret, ensure output[off] = '\0' (or output[output_size-1]='\0' if truncated), then compute remaining as (off < output_size - 1) ? (output_size - off - 1) : 0 before computing body_copy and calling memcpy with hb.len/hb.data; update logic around variables off, remaining, body_copy, output, output_size, hb.len and hb.data accordingly.
92-98:⚠️ Potential issue | 🟡 MinorAccumulator drops bytes at exact boundary and skips partial chunk copy.
The condition
needed + 1 < hb->capdiscards data when it exactly fits. Additionally, when a chunk would overflow, the entire chunk is dropped instead of copying as much as possible.Suggested fix
if (evt->event_id == HTTP_EVENT_ON_DATA) { - size_t needed = hb->len + evt->data_len; - if (needed + 1 < hb->cap) { - memcpy(hb->data + hb->len, evt->data, evt->data_len); - hb->len += evt->data_len; + if (hb->len < hb->cap - 1) { + size_t avail = hb->cap - hb->len - 1; + size_t copy = (size_t)evt->data_len < avail ? (size_t)evt->data_len : avail; + memcpy(hb->data + hb->len, evt->data, copy); + hb->len += copy; hb->data[hb->len] = '\0'; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_http_request.c` around lines 92 - 98, The accumulator drops bytes at the exact boundary and drops entire over-sized chunks; in the HTTP_EVENT_ON_DATA branch update the capacity check and partial-copy logic: change the condition using hb->len / hb->cap to allow exact fit (use <= for needed + 1 vs hb->cap), and when evt->data_len would overflow, copy only the number of bytes that fit (size_t can_copy = hb->cap - 1 - hb->len), memcpy that many from evt->data into hb->data, advance hb->len and set hb->data[hb->len] = '\0'; reference symbols: HTTP_EVENT_ON_DATA, evt->data_len, hb->len, hb->cap, hb->data (and consider growing/reallocating hb if full instead of truncating).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/tools/tool_http_request.c`:
- Around line 50-79: The is_blocked_destination function currently resolves the
host only to check IPv4 and then the caller re-resolves (TOCTOU) and it also
ignores IPv6; change it to use getaddrinfo with hints.ai_family = AF_UNSPEC,
iterate the entire res linked list and check every ai_family (AF_INET and
AF_INET6) and corresponding ai_addr entries (convert IPv4 with ntohl on
sockaddr_in and add IPv6 check using the in6_addr bytes or a new
isAlwaysBlocked6 helper), and instead of just returning true/false keep/return
the resolved sockaddr (or add an API variant like resolve_and_check_destination
that returns the first safe resolved addr) so the caller can use that sockaddr
for connect() and avoid a second DNS lookup; also ensure you freeaddrinfo(res)
and propagate errors for connection-time validation as defense-in-depth.
- Around line 150-154: The code currently reads the HTTP status code
unconditionally even when esp_http_client_perform(client) fails; initialize
*out_status = 0 before calling esp_http_client_perform and only call
esp_http_client_get_status_code(client) to update *out_status when
esp_http_client_perform returns ESP_OK (or indicates success). Keep
esp_http_client_cleanup(client) executed regardless of outcome, and return the
original err so callers get the error and a safe zeroed status when the request
failed; locate symbols esp_http_client_perform, esp_http_client_get_status_code,
esp_http_client_cleanup, and out_status to apply this change.
- Around line 358-371: The SSRF protection currently skips checking when hlen ==
0 or hlen >= sizeof(host_buf); update the validation around hlen/host_buf so any
invalid host length (hlen == 0 || hlen >= sizeof(host_buf)) is treated as an
error: log a warning via ESP_LOGW including the raw host slice if available,
free input with cJSON_Delete(input), write an explanatory message into output
with snprintf, and return ESP_ERR_INVALID_ARG instead of falling through; keep
the existing blocked-path behavior using is_blocked_destination(host_buf) when
the host length is valid and the host is copied into host_buf.
---
Duplicate comments:
In `@main/tools/tool_http_request.c`:
- Around line 416-437: The variable method points into the cJSON structure input
and is freed by cJSON_Delete(input), so copy method into a local buffer before
calling cJSON_Delete to avoid a dangling pointer; e.g., allocate or
stack-allocate a small buffer (e.g., method_copy) and copy the string (using
strncpy/strdup/snprintf semantics), then call cJSON_Delete(input), use
method_copy in the ESP_LOGI call, and free method_copy if it was heap-allocated;
ensure references to method in the function (notably the ESP_LOGI(TAG, "HTTP %s
complete...") call) are replaced with the safe copy.
- Around line 240-266: The snprintf calls building the HTTP request can report a
larger length than the actual buffer and rlen is being used directly for the
subsequent write, risking truncated writes; update the logic around req_buf,
rlen, and proxy_conn_write to (1) compute and use a single buffer_size variable
(e.g. buf_size = 2048 + body_len), (2) check snprintf's return: if it returns a
negative value treat as error; if it returns >= buf_size, clamp rlen to buf_size
- 1 (the actual bytes written) before calling proxy_conn_write, and (3) ensure
any write uses the actual bytes placed in req_buf (not the full would-be
length), applying this to both branches (body_len > 0 and body_len == 0) and
referencing req_buf, rlen, body_len, hdr_extra, snprintf, and proxy_conn_write.
- Around line 325-332: The function tool_http_request_execute must validate the
output buffer before any snprintf use: at the start of tool_http_request_execute
(or immediately after cJSON_Parse) check that output is not NULL and output_size
> 0 and if the check fails return ESP_ERR_INVALID_ARG (and free any allocated
cJSON *input if already created). Replace direct snprintf calls assuming valid
buffer with code paths that only call snprintf after this validation to avoid
undefined behavior.
- Around line 179-181: The port parsing currently uses atoi(colon + 1) which
accepts malformed or out-of-range values; replace this with strtol and validate
the result: call strtol(colon + 1, &endptr, 10), ensure endptr points to the end
of the numeric substring (i.e., no non-digit characters before the next
delimiter), check errno for ERANGE, and verify the resulting long is within
1..65535 before assigning to *port; if validation fails, handle it as an error
(reject/return) instead of accepting 0 or an invalid port. Use the existing
variables (p, colon, slash, hlen, port) to locate the parsing logic in this
block.
- Around line 222-232: The loop building hdr_extra can overflow because hdr_off
is incremented by snprintf's return value without bounds checking; fix the loop
in the cJSON_ArrayForEach over headers by computing size_t remaining =
sizeof(hdr_extra) - hdr_off and only calling snprintf with that remaining size,
capture its return (int n), then if n < 0 break, if (size_t)n >= remaining set
hdr_off = sizeof(hdr_extra) - 1 and break (to avoid further writes), otherwise
hdr_off += (size_t)n; ensure hdr_extra remains null-terminated; update the block
around hdr_extra/hdr_off and the cJSON_ArrayForEach to perform these checks.
- Around line 424-433: The snprintf result stored in off can be >= output_size
causing underflow in the subtraction for remaining; clamp and handle snprintf's
return: capture snprintf's (or vsnprintf's) return value, convert to size_t
safely and set off = (off_ret >= output_size) ? (output_size - 1) : off_ret,
ensure output[off] = '\0' (or output[output_size-1]='\0' if truncated), then
compute remaining as (off < output_size - 1) ? (output_size - off - 1) : 0
before computing body_copy and calling memcpy with hb.len/hb.data; update logic
around variables off, remaining, body_copy, output, output_size, hb.len and
hb.data accordingly.
- Around line 92-98: The accumulator drops bytes at the exact boundary and drops
entire over-sized chunks; in the HTTP_EVENT_ON_DATA branch update the capacity
check and partial-copy logic: change the condition using hb->len / hb->cap to
allow exact fit (use <= for needed + 1 vs hb->cap), and when evt->data_len would
overflow, copy only the number of bytes that fit (size_t can_copy = hb->cap - 1
- hb->len), memcpy that many from evt->data into hb->data, advance hb->len and
set hb->data[hb->len] = '\0'; reference symbols: HTTP_EVENT_ON_DATA,
evt->data_len, hb->len, hb->cap, hb->data (and consider growing/reallocating hb
if full instead of truncating).
0c2794a to
48cebf2
Compare
Summary by CodeRabbit
New Features
Chores