Fix large responses that break web_search tool#89
Fix large responses that break web_search tool#89mogenson wants to merge 4 commits intomemovai:mainfrom
Conversation
The web_search tool frequently fails because the responses from the Brave Web API are too large to store and parse. When a response is larger than the 16k response buffer, the result is an incomplete JSON fragment that fails to parse. Fix this with two changes: 1. Provide a 'result_filter=web' filter parameter with the query. This omits Brave Web API results for news, videos, discussions, faqs, and other results that are not parsed by the web_tool. Only web results are returned. 2. Increase the response buffer from 16kb to 64kb. I ran a few queries and the max response size I saw was 58kb. There's two possible future optimizations not covered in this fix: Adding the "Accept-Encoding: gzip" header to received a compressed response. This would reduce the network traffic and required size for the response buffer. But, we'd still need a large internal buffer to inflate the response before parsing JSON. Next, most results from the Brave Web API have an 'extra_snippets' field which includes a summary of the result. This seems like useful additional context to provide to the agent. A web_search query for "Bitcoin price" now successfully returns the following response to the agent (tested by adding a debug log in the agent loop) ----------------------------------------------------------------------------- I (635063) agent: Tool web_search response: 1. Bitcoin price today, BTC to USD live price, marketcap and chart | CoinMarketCap https://coinmarketcap.com/currencies/bitcoin/ The live Bitcoin price today is <strong>$64,871.55 USD</strong> with a 24-hour trading volume of $45,314,283,881 USD. We update our BTC to USD price in real-time. Bitcoin is down 3.74% in the last 24 hours. 2. Bitcoin price today, BTC to USD live price, marketcap and chart | CoinDesk https://www.coindesk.com/price/bitcoin The price of Bitcoin (BTC) is <strong>$63,508.01</strong> today as of Feb 24, 2026, 12:15 am EST, with a 24-hour trading volume of $24.62B. 3. Bitcoin BTC (BTC-USD) Live Price, News, Chart & Price History - Yahoo Finance https://finance.yahoo.com/quote/BTC-USD/ Bitcoin has a current supply of 19,993,606. The last known price of Bitcoin is <strong>66,178.39249005 USD</strong> and is down -2.15 over the last 24 hours. It is currently trading on 12564 active market(s) with $39,417,837,545.32 traded over the last 24 hours. 4. Bitcoin Price Chart (BTC/USD) | Bitcoin Value | bitFlyer USA https://bitflyer.com/en-us/bitcoin-chart Check out the current Bitcoin (BTC) price, market cap, historical volatility, and buy Bitcoin on bitFlyer today with <strong>as little as $1</strong>! 5. Bitcoin Price | BTC to USD Converter, Chart and News https://www.binance.com/en/price/bitcoin Live price of Bitcoin is <strong>$96,262.14</strong> with a market cap of $1,908.19B USD. Discover current price, trading volume, chart history, and more.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughConstructs the Brave search request URL on the heap (base + path template + URL-encoded query), increases the response buffer from 16 KB to 64 KB, adds an overflow guard and warning, adjusts proxy handling to pass only the path, and improves allocation/error cleanup paths. (≤50 words) Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Tool as Tool (web_search)
participant Proxy as HTTP Proxy
participant Remote as Brave API
User->>Tool: submit query
Tool->>Tool: build full_url on heap (base + path template + urlencode(query))
alt using proxy
Tool->>Proxy: send request with path only
Proxy->>Remote: forward full URL request
Remote-->>Proxy: response
Proxy-->>Tool: response
else direct
Tool->>Remote: send request with full_url
Remote-->>Tool: response
end
Tool->>Tool: check response size (overflow guard)
Tool-->>User: return results or warning
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)
172-180:⚠️ Potential issue | 🟡 Minor
header[512]stack buffer over-read insearch_via_proxy— worsened by the new path parameter.
snprintfalways returns the number of characters it would have written, regardless of truncation. With the worst-case inputs:
Component Bytes Fixed HTTP header text 113 path(max after this PR)300 s_search_key(max)127 Total 540
snprintfwrites 511 bytes (truncated) but returnshlen = 540. Line 180 then callsproxy_conn_write(conn, header, hlen), instructing it to read 540 bytes from a 512-byte stack array — a 28-byte over-read of adjacent stack memory (UB). Before this PR the worst case was 522 bytes (pre-existing bug); the new&result_filter=webadds 18 bytes, extending the over-read.The simplest fix is to increase
headerand clamphlen:🛠 Proposed fix
- char header[512]; + char header[640]; /* 113 fixed + 300 path + 127 key + margin */ int hlen = snprintf(header, sizeof(header), "GET %s HTTP/1.1\r\n" ... path, s_search_key); + if (hlen < 0 || (size_t)hlen >= sizeof(header)) { + ESP_LOGW(TAG, "Proxy header truncated (%d bytes)", hlen); + hlen = (int)sizeof(header) - 1; + } if (proxy_conn_write(conn, header, hlen) < 0) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_web_search.c` around lines 172 - 180, In search_via_proxy, snprintf writes into the local header buffer but returns the total bytes that would have been written (hlen), which can exceed the header array size and lead to a buffer over-read when calling proxy_conn_write(header, hlen); to fix, expand header to safely hold the worst-case fixed text + path + s_search_key (or a conservative larger size) and, after snprintf, check for a negative return and clamp hlen to at most sizeof(header)-1 (or sizeof(header)) before passing it to proxy_conn_write; reference the symbols header, hlen, snprintf, s_search_key and proxy_conn_write when making the change.
🧹 Nitpick comments (1)
main/tools/tool_web_search.c (1)
33-38: Consider logging a warning when the buffer is full and data is dropped.When
needed >= sb->capthe incoming chunk is silently discarded, which surfaces later as a generic"Error: Failed to parse search results"with no indication that the response was truncated. Adding anESP_LOGWhere would make future size regressions immediately diagnosable without any other change to behaviour.🪵 Suggested diagnostic log
size_t needed = sb->len + evt->data_len; if (needed < sb->cap) { memcpy(sb->data + sb->len, evt->data, evt->data_len); sb->len += evt->data_len; sb->data[sb->len] = '\0'; + } else { + ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf %zu/%zu)", + evt->data_len, sb->len, sb->cap); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_web_search.c` around lines 33 - 38, The code currently silently drops incoming chunks when the string buffer is full in the block that checks "if (needed < sb->cap)"—add an ESP_LOGW warning in the else branch so we log when data is truncated; include context such as sb (buffer) capacity, current sb->len, and evt->data_len (or evt->data_len and the incoming chunk) in the message to aid debugging (i.e., update the branch around the memcpy/str-terminate logic in tool_web_search.c where sb and evt are referenced).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 172-180: In search_via_proxy, snprintf writes into the local
header buffer but returns the total bytes that would have been written (hlen),
which can exceed the header array size and lead to a buffer over-read when
calling proxy_conn_write(header, hlen); to fix, expand header to safely hold the
worst-case fixed text + path + s_search_key (or a conservative larger size) and,
after snprintf, check for a negative return and clamp hlen to at most
sizeof(header)-1 (or sizeof(header)) before passing it to proxy_conn_write;
reference the symbols header, hlen, snprintf, s_search_key and proxy_conn_write
when making the change.
---
Nitpick comments:
In `@main/tools/tool_web_search.c`:
- Around line 33-38: The code currently silently drops incoming chunks when the
string buffer is full in the block that checks "if (needed < sb->cap)"—add an
ESP_LOGW warning in the else branch so we log when data is truncated; include
context such as sb (buffer) capacity, current sb->len, and evt->data_len (or
evt->data_len and the incoming chunk) in the message to aid debugging (i.e.,
update the branch around the memcpy/str-terminate logic in tool_web_search.c
where sb and evt are referenced).
When needed >= sb->cap the incoming chunk is silently discarded, which surfaces later as a generic "Error: Failed to parse search results" with no indication that the response was truncated.
Suggested warning log applied in 60d4757 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/tools/tool_web_search.c`:
- Around line 258-268: If calloc for url fails, free the parsed cJSON object
(input) before returning and return the OOM error; also stop silently truncating
the encoded query by sizing url based on the actual lengths of s_brave_url,
s_path_template and the query payload rather than a fixed 256 bytes (or
explicitly reject queries that exceed a safe maximum). Concretely: in the
function building the URL, after parsing keep references to query->valuestring,
compute required_size = strlen(s_brave_url) + strlen(s_path_template) +
worst_case_encoded_length(strlen(query->valuestring)) + 1, allocate url with
that size, check calloc failure and call cJSON_Delete(input) before returning
ESP_ERR_NO_MEM, then use memcpy on s_brave_url/s_path_template and call
url_encode into the correctly sized tail; alternatively enforce and validate a
max query length up-front and return an error for too-long inputs.
8695afd to
a2b85dd
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)
179-186:⚠️ Potential issue | 🟠 MajorProxy header buffer (512 B) can be exceeded by long queries, producing truncated HTTP requests.
With
s_path_template(~46 chars) + a URL-encoded query (up to 256 chars) + the API key (up to 127 chars) + fixed header text (~90 chars), the formatted header can reach ~543 bytes, exceeding the 512-byteheaderbuffer.snprintfwill prevent a memory overwrite but will silently truncate the request, causing a malformed HTTP request and a silent failure.Consider sizing
headerdynamically (e.g., based onstrlen(path) + sizeof(s_search_key) + 200) or at least increasing the static size to accommodate the worst case.♻️ Suggested dynamic sizing
- char header[512]; - int hlen = snprintf(header, sizeof(header), + size_t hdr_cap = strlen(path) + sizeof(s_search_key) + 200; + char *header = malloc(hdr_cap); + if (!header) { + proxy_conn_close(conn); + return ESP_ERR_NO_MEM; + } + int hlen = snprintf(header, hdr_cap, "GET %s HTTP/1.1\r\n" "Host: api.search.brave.com\r\n" "Accept: application/json\r\n" "X-Subscription-Token: %s\r\n" "Connection: close\r\n\r\n", path, s_search_key);Don't forget to
free(header)afterproxy_conn_write(and on the write-error path).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_web_search.c` around lines 179 - 186, The fixed-size char header[512] used with snprintf in the GET request can be overflowed/truncated for long path or API key values (see header, hlen, snprintf using path and s_search_key); change to allocate the header dynamically sized from strlen(path) + strlen(s_search_key) + a safe constant for fixed text (or compute from s_path_template) so the formatted request cannot be truncated, then use vsnprintf/write into the heap buffer and ensure you free(header) after proxy_conn_write and also free on any write/error path; keep hlen semantics (return value of snprintf/vsnprintf) for error/truncation checking.
🧹 Nitpick comments (1)
main/tools/tool_web_search.c (1)
196-206: Proxy read loop silently drops data without a warning, unlike the direct-mode handler.The
http_event_handler(Line 44) now logsESP_LOGWwhen data is truncated, but the proxy read loop here silently discards excess bytes. For consistency, consider adding a similar warning whencopy < (size_t)n.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_web_search.c` around lines 196 - 206, The proxy read loop that uses proxy_conn_read silently drops excess bytes when copy < (size_t)n; update the loop to detect when copy < (size_t)n and emit a warning similar to http_event_handler (use ESP_LOGW) including context such as the connection identifier (conn), number of bytes read (n), and number of bytes copied (copy) so truncated data is visible in logs; keep the existing memcpy/total logic and only add the conditional warning before or after the memcpy to mirror http_event_handler behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 179-186: The fixed-size char header[512] used with snprintf in the
GET request can be overflowed/truncated for long path or API key values (see
header, hlen, snprintf using path and s_search_key); change to allocate the
header dynamically sized from strlen(path) + strlen(s_search_key) + a safe
constant for fixed text (or compute from s_path_template) so the formatted
request cannot be truncated, then use vsnprintf/write into the heap buffer and
ensure you free(header) after proxy_conn_write and also free on any write/error
path; keep hlen semantics (return value of snprintf/vsnprintf) for
error/truncation checking.
---
Nitpick comments:
In `@main/tools/tool_web_search.c`:
- Around line 196-206: The proxy read loop that uses proxy_conn_read silently
drops excess bytes when copy < (size_t)n; update the loop to detect when copy <
(size_t)n and emit a warning similar to http_event_handler (use ESP_LOGW)
including context such as the connection identifier (conn), number of bytes read
(n), and number of bytes copied (copy) so truncated data is visible in logs;
keep the existing memcpy/total logic and only add the conditional warning before
or after the memcpy to mirror http_event_handler behavior.
a2b85dd to
8a5f82a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)
179-186:⚠️ Potential issue | 🟡 MinorDetect proxy header truncation for long queries.
snprintfinto a 512‑byte stack buffer can silently truncate when the query (or key) is long, yielding malformed proxy requests. Please detect truncation and fail fast (or allocate dynamically).🔧 Proposed fix
char header[512]; int hlen = snprintf(header, sizeof(header), "GET %s HTTP/1.1\r\n" "Host: api.search.brave.com\r\n" "Accept: application/json\r\n" "X-Subscription-Token: %s\r\n" "Connection: close\r\n\r\n", path, s_search_key); + if (hlen < 0 || hlen >= (int)sizeof(header)) { + proxy_conn_close(conn); + return ESP_ERR_INVALID_SIZE; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_web_search.c` around lines 179 - 186, The current snprintf into the fixed 512-byte stack buffer (header) when building the GET request can silently truncate long path or s_search_key values; update the code around snprintf(header, sizeof(header), ...) and hlen to detect truncation (check if hlen < 0 or hlen >= sizeof(header)) and fail fast (return an error) or switch to dynamic allocation: call snprintf(NULL, 0, ...) to compute needed length, malloc the required buffer, then format into it; ensure you reference header/hlen/snprintf/path/s_search_key and free any allocated memory on error paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@main/tools/tool_web_search.c`:
- Around line 258-270: Compute query_len safely by checking for size_t overflow
before multiplying strlen(query->valuestring) by 3 (e.g., ensure strlen <=
(SIZE_MAX - N)/3) and return an error if it would overflow; allocate url with
extra guard bytes to match url_encode’s internal reserve (add at least 2–3 extra
bytes beyond the computed query expansion) and use the correct total size:
sizeof(s_brave_url) + sizeof(s_path_template) + query_len + GUARD_BYTES; if
allocation fails, handle as before. Ensure you reference and update the
variables query_len, url, and the call to url_encode (and preserve s_brave_url
and s_path_template usage) so the buffer cannot be truncated by url_encode’s
internal assumptions.
---
Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 179-186: The current snprintf into the fixed 512-byte stack buffer
(header) when building the GET request can silently truncate long path or
s_search_key values; update the code around snprintf(header, sizeof(header),
...) and hlen to detect truncation (check if hlen < 0 or hlen >= sizeof(header))
and fail fast (return an error) or switch to dynamic allocation: call
snprintf(NULL, 0, ...) to compute needed length, malloc the required buffer,
then format into it; ensure you reference header/hlen/snprintf/path/s_search_key
and free any allocated memory on error paths.
main/tools/tool_web_search.c
Outdated
| size_t query_len = strlen(query->valuestring) * 3 + 1; // worse case URL-encoding expansion | ||
| char *url = calloc(1, sizeof(s_brave_url) + sizeof(s_path_template) + query_len); | ||
| if (!url) { | ||
| cJSON_Delete(input); | ||
| snprintf(output, output_size, "Error: Out of memory"); | ||
| return ESP_ERR_NO_MEM; | ||
| } | ||
|
|
||
| char path[384]; | ||
| snprintf(path, sizeof(path), | ||
| "/res/v1/web/search?q=%s&count=%d", encoded_query, SEARCH_RESULT_COUNT); | ||
| /* Build Query */ | ||
| memcpy(url, s_brave_url, sizeof(s_brave_url)); | ||
| memcpy(url + sizeof(s_brave_url) - 1, s_path_template, sizeof(s_path_template)); | ||
| url_encode(query->valuestring, url + sizeof(s_brave_url) + sizeof(s_path_template) - 2, query_len); | ||
| cJSON_Delete(input); |
There was a problem hiding this comment.
Guard URL buffer sizing and avoid worst‑case truncation.
size_t query_len = strlen(...) * 3 + 1 can overflow for very large inputs, and url_encode reserves 3 bytes internally, so the “worst‑case” capacity still truncates by up to 2 bytes. Please add an overflow check and size the buffer to match url_encode’s guard.
🔧 Proposed fix
+#include <stdint.h>
@@
- size_t query_len = strlen(query->valuestring) * 3 + 1; // worse case URL-encoding expansion
- char *url = calloc(1, sizeof(s_brave_url) + sizeof(s_path_template) + query_len);
+ size_t raw_len = strlen(query->valuestring);
+ if (raw_len > (SIZE_MAX - sizeof(s_brave_url) - sizeof(s_path_template) - 3) / 3) {
+ cJSON_Delete(input);
+ snprintf(output, output_size, "Error: Query too large");
+ return ESP_ERR_INVALID_ARG;
+ }
+ size_t query_len = raw_len * 3 + 3; // match url_encode headroom
+ size_t url_cap = sizeof(s_brave_url) + sizeof(s_path_template) + query_len;
+ char *url = calloc(1, url_cap);
@@
- url_encode(query->valuestring, url + sizeof(s_brave_url) + sizeof(s_path_template) - 2, query_len);
+ url_encode(query->valuestring, url + sizeof(s_brave_url) + sizeof(s_path_template) - 2, query_len);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@main/tools/tool_web_search.c` around lines 258 - 270, Compute query_len
safely by checking for size_t overflow before multiplying
strlen(query->valuestring) by 3 (e.g., ensure strlen <= (SIZE_MAX - N)/3) and
return an error if it would overflow; allocate url with extra guard bytes to
match url_encode’s internal reserve (add at least 2–3 extra bytes beyond the
computed query expansion) and use the correct total size: sizeof(s_brave_url) +
sizeof(s_path_template) + query_len + GUARD_BYTES; if allocation fails, handle
as before. Ensure you reference and update the variables query_len, url, and the
call to url_encode (and preserve s_brave_url and s_path_template usage) so the
buffer cannot be truncated by url_encode’s internal assumptions.
There was a problem hiding this comment.
I'll increase the number of reserved bytes for url_encode from 1 to 3 and switch strlen to strnlen with a max string length of 256. But I'm not worried at all about a query string length exceeding the size_t limit.
There was a problem hiding this comment.
@mogenson, that sounds good! Using strnlen with a 256-byte cap and increasing the reserved bytes to 3 effectively addresses the truncation concern while keeping the implementation practical for this embedded context. With that cap in place, the size_t overflow becomes mathematically impossible, so your approach makes sense. 👍
8a5f82a to
bc75b0e
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)
179-188:⚠️ Potential issue | 🔴 CriticalCritical: proxy header length is unchecked, enabling out-of-bounds read on write.
With Line 285 passing potentially long paths, Line 180 can produce
hlen >= sizeof(header). Line 188 then writeshlenbytes from a 512-byte stack buffer, which can read past bounds and leak memory.🛡️ Safe fix (size-aware header allocation + length check)
static esp_err_t search_via_proxy(const char *path, search_buf_t *sb) { proxy_conn_t *conn = proxy_conn_open("api.search.brave.com", 443, 15000); if (!conn) return ESP_ERR_HTTP_CONNECT; - char header[512]; - int hlen = snprintf(header, sizeof(header), + size_t token_len = strnlen(s_search_key, sizeof(s_search_key)); + size_t header_cap = + strlen("GET HTTP/1.1\r\nHost: api.search.brave.com\r\nAccept: application/json\r\nX-Subscription-Token: \r\nConnection: close\r\n\r\n") + + strlen(path) + token_len + 1; + char *header = malloc(header_cap); + if (!header) { + proxy_conn_close(conn); + return ESP_ERR_NO_MEM; + } + int hlen = snprintf(header, header_cap, "GET %s HTTP/1.1\r\n" "Host: api.search.brave.com\r\n" "Accept: application/json\r\n" "X-Subscription-Token: %s\r\n" "Connection: close\r\n\r\n", path, s_search_key); + if (hlen < 0 || (size_t)hlen >= header_cap) { + free(header); + proxy_conn_close(conn); + return ESP_ERR_INVALID_SIZE; + } - if (proxy_conn_write(conn, header, hlen) < 0) { + if (proxy_conn_write(conn, header, (size_t)hlen) < 0) { + free(header); proxy_conn_close(conn); return ESP_ERR_HTTP_WRITE_DATA; } + free(header);Also applies to: 285-285
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_web_search.c` around lines 179 - 188, The snprintf call building header into the fixed-size char header[512] can return hlen >= sizeof(header) causing proxy_conn_write(conn, header, hlen) to read out-of-bounds; change to a size-aware safe pattern: detect snprintf return value and if hlen < 0 or hlen >= sizeof(header) handle as error or allocate a sufficiently sized buffer (e.g., compute required length and malloc) before writing, and only call proxy_conn_write with the actual valid byte count; update references to header, hlen, snprintf, path, s_search_key and proxy_conn_write to perform the length check and fail or reallocate when the header would be truncated.
🧹 Nitpick comments (1)
main/tools/tool_web_search.c (1)
43-45: Return an explicit “response too large” error instead of falling through to JSON parse failure.At Line 43 you drop bytes and only log. That usually turns into a generic parse error later, which obscures the real failure mode.
🔧 Suggested refactor
typedef struct { char *data; size_t len; size_t cap; + bool truncated; } search_buf_t; @@ static esp_err_t http_event_handler(esp_http_client_event_t *evt) { search_buf_t *sb = (search_buf_t *)evt->user_data; if (evt->event_id == HTTP_EVENT_ON_DATA) { size_t needed = sb->len + evt->data_len; if (needed < sb->cap) { memcpy(sb->data + sb->len, evt->data, evt->data_len); sb->len += evt->data_len; sb->data[sb->len] = '\0'; } else { - ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf %zu/%zu)", - evt->data_len, sb->len, sb->cap); + if (!sb->truncated) { + sb->truncated = true; + ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf %zu/%zu)", + evt->data_len, sb->len, sb->cap); + } } } return ESP_OK; } @@ if (err != ESP_OK) { free(sb.data); snprintf(output, output_size, "Error: Search request failed"); return err; } + if (sb.truncated) { + free(sb.data); + snprintf(output, output_size, "Error: Search response exceeded buffer"); + return ESP_ERR_INVALID_SIZE; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main/tools/tool_web_search.c` around lines 43 - 45, The else branch that currently just logs via ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf %zu/%zu)", evt->data_len, sb->len, sb->cap) should instead return an explicit "response too large" error to the caller and avoid proceeding to JSON parsing; update the code in the block handling evt->data_len/sb (the branch that drops bytes) to set/return a clear error value (e.g., an error code or bool/failure status used by the surrounding function) and include the same descriptive message (with evt->data_len, sb->len, sb->cap) via ESP_LOGW before returning so callers can detect and handle this specific failure instead of receiving a generic JSON parse error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 179-188: The snprintf call building header into the fixed-size
char header[512] can return hlen >= sizeof(header) causing
proxy_conn_write(conn, header, hlen) to read out-of-bounds; change to a
size-aware safe pattern: detect snprintf return value and if hlen < 0 or hlen >=
sizeof(header) handle as error or allocate a sufficiently sized buffer (e.g.,
compute required length and malloc) before writing, and only call
proxy_conn_write with the actual valid byte count; update references to header,
hlen, snprintf, path, s_search_key and proxy_conn_write to perform the length
check and fail or reallocate when the header would be truncated.
---
Nitpick comments:
In `@main/tools/tool_web_search.c`:
- Around line 43-45: The else branch that currently just logs via ESP_LOGW(TAG,
"Response too large; dropped %d bytes (buf %zu/%zu)", evt->data_len, sb->len,
sb->cap) should instead return an explicit "response too large" error to the
caller and avoid proceeding to JSON parsing; update the code in the block
handling evt->data_len/sb (the branch that drops bytes) to set/return a clear
error value (e.g., an error code or bool/failure status used by the surrounding
function) and include the same descriptive message (with evt->data_len, sb->len,
sb->cap) via ESP_LOGW before returning so callers can detect and handle this
specific failure instead of receiving a generic JSON parse error.
|
Thanks for your contribution! |
IRONICBo
left a comment
There was a problem hiding this comment.
Good fix — bumping the buffer to 64K, moving URL construction to heap, and adding the truncation warning are all sensible.
One issue: search_via_proxy still uses char header[512] with snprintf returning hlen, then passes hlen to proxy_conn_write. When hlen >= 512 (possible with longer encoded queries + 127-byte key), this reads past the stack buffer. Please either clamp hlen = MIN(hlen, sizeof(header)-1) or bump header to 768.
Minor: strnlen(query->valuestring, 256) silently caps the query — fine but worth a comment.
Overall correct and a clear improvement.
In non-proxy mode, tool_web_search_execute allocates about 1.2kB on the stack,
which is enough to cause a stack overflow on the console repl task.
Replace the stack allocates for the query, path, and url buffers with a single
heap allocation for a full url with query. Use static string literals with
concatenated preprocessor defines to populate the url buffer. Encode the query
at the end of the path template, since the Brave Web API allows the query
parameter to be in any location.
The proxy web search is even more problematic, since it allocates a 4kB response
buffer on the stack. Fixing the proxy web search is out of scope for this commit.
Before:
```
mimi> tool_exec web_search "{\"query\": \"Bitcoin price\"}""
I (110154) tools: Executing tool: web_search
I (110154) web_search: Searching: Bitcoin price
***ERROR*** A stack overflow in task console_repl has been detected.
```
After:
```
mimi> tool_exec web_search "{\"query\": \"Bitcoin price\"}"
I (359774) tools: Executing tool: web_search
I (359774) web_search: Searching: Bitcoin price
I (361114) web_search: Search complete, 1347 bytes result
tool_exec status: ESP_OK
```
The function search_via_proxy still uses char header[512] with snprintf returning hlen, then passes hlen to proxy_conn_write. When hlen >= 512 (possible with longer encoded queries + 127-byte key), this reads past the stack buffer. Increase the header buffer size to 768.
bc75b0e to
b03ae4c
Compare
|
@IRONICBo I added a comment about query length being truncated to 256 chars by I added a new CL increasing the mimiclaw/main/tools/tool_web_search.c Line 186 in 1ccaa09 |
The web_search tool frequently fails because the responses from the Brave Web API are too large to store and parse. When a response is larger than the 16k response buffer, the result is an incomplete JSON fragment that fails to parse.
Fix this with two changes:
Provide a 'result_filter=web' filter parameter with the query. This omits Brave Web API results for news, videos, discussions, faqs, and other results that are not parsed by the web_tool. Only web results are returned.
Increase the response buffer from 16kb to 64kb. I ran a few queries and the max response size I saw was 58kb.
There's two possible future optimizations not covered in this fix: Adding the "Accept-Encoding: gzip" header to received a compressed response. This would reduce the network traffic and required size for the response buffer. But, we'd still need a large internal buffer to inflate the response before parsing JSON. Next, most results from the Brave Web API have an 'extra_snippets' field which includes a summary of the result. This seems like useful additional context to provide to the agent.
A web_search query for "Bitcoin price" now successfully returns the following response to the agent (tested by adding a debug log in the agent loop)
Summary by CodeRabbit
Bug Fixes
Refactor