Skip to content

Fix large responses that break web_search tool#89

Open
mogenson wants to merge 4 commits intomemovai:mainfrom
mogenson:fix-web-search
Open

Fix large responses that break web_search tool#89
mogenson wants to merge 4 commits intomemovai:mainfrom
mogenson:fix-web-search

Conversation

@mogenson
Copy link
Copy Markdown
Contributor

@mogenson mogenson commented Feb 24, 2026

The web_search tool frequently fails because the responses from the Brave Web API are too large to store and parse. When a response is larger than the 16k response buffer, the result is an incomplete JSON fragment that fails to parse.

Fix this with two changes:

  1. Provide a 'result_filter=web' filter parameter with the query. This omits Brave Web API results for news, videos, discussions, faqs, and other results that are not parsed by the web_tool. Only web results are returned.

  2. Increase the response buffer from 16kb to 64kb. I ran a few queries and the max response size I saw was 58kb.

There's two possible future optimizations not covered in this fix: Adding the "Accept-Encoding: gzip" header to received a compressed response. This would reduce the network traffic and required size for the response buffer. But, we'd still need a large internal buffer to inflate the response before parsing JSON. Next, most results from the Brave Web API have an 'extra_snippets' field which includes a summary of the result. This seems like useful additional context to provide to the agent.

A web_search query for "Bitcoin price" now successfully returns the following response to the agent (tested by adding a debug log in the agent loop)


I (635063) agent: Tool web_search response:

  1. Bitcoin price today, BTC to USD live price, marketcap and chart | CoinMarketCap https://coinmarketcap.com/currencies/bitcoin/ The live Bitcoin price today is $64,871.55 USD with a 24-hour trading volume of $45,314,283,881 USD. We update our BTC to USD price in real-time. Bitcoin is down 3.74% in the last 24 hours.

  2. Bitcoin price today, BTC to USD live price, marketcap and chart | CoinDesk https://www.coindesk.com/price/bitcoin The price of Bitcoin (BTC) is $63,508.01 today as of Feb 24, 2026, 12:15 am EST, with a 24-hour trading volume of $24.62B.

  3. Bitcoin BTC (BTC-USD) Live Price, News, Chart & Price History - Yahoo Finance https://finance.yahoo.com/quote/BTC-USD/ Bitcoin has a current supply of 19,993,606. The last known price of Bitcoin is 66,178.39249005 USD and is down -2.15 over the last 24 hours. It is currently trading on 12564 active market(s) with $39,417,837,545.32 traded over the last 24 hours.

  4. Bitcoin Price Chart (BTC/USD) | Bitcoin Value | bitFlyer USA https://bitflyer.com/en-us/bitcoin-chart Check out the current Bitcoin (BTC) price, market cap, historical volatility, and buy Bitcoin on bitFlyer today with as little as $1!

  5. Bitcoin Price | BTC to USD Converter, Chart and News https://www.binance.com/en/price/bitcoin Live price of Bitcoin is $96,262.14 with a market cap of $1,908.19B USD. Discover current price, trading volume, chart history, and more.

Summary by CodeRabbit

  • Bug Fixes

    • Handles much larger web search responses and warns on potential overflow to avoid truncation.
    • Fewer crashes and leaks thanks to improved out-of-memory handling and cleaner error/cleanup paths.
  • Refactor

    • More robust URL construction, encoding, and request logic for both proxied and direct requests, improving stability and error messages.

The web_search tool frequently fails because the responses from the Brave Web
API are too large to store and parse. When a response is larger than the 16k
response buffer, the result is an incomplete JSON fragment that fails to parse.

Fix this with two changes:

1. Provide a 'result_filter=web' filter parameter with the query. This omits
Brave Web API results for news, videos, discussions, faqs, and other results
that are not parsed by the web_tool. Only web results are returned.

2. Increase the response buffer from 16kb to 64kb. I ran a few queries and the
max response size I saw was 58kb.

There's two possible future optimizations not covered in this fix: Adding the
"Accept-Encoding: gzip" header to received a compressed response. This would
reduce the network traffic and required size for the response buffer. But, we'd
still need a large internal buffer to inflate the response before parsing JSON.
Next, most results from the Brave Web API have an 'extra_snippets' field which
includes a summary of the result. This seems like useful additional context to
provide to the agent.

A web_search query for "Bitcoin price" now successfully returns the following
response to the agent (tested by adding a debug log in the agent loop)

-----------------------------------------------------------------------------
I (635063) agent: Tool web_search response: 1. Bitcoin price
today, BTC to USD live price, marketcap and chart | CoinMarketCap
https://coinmarketcap.com/currencies/bitcoin/ The live Bitcoin price today is
<strong>$64,871.55 USD</strong> with a 24-hour trading volume of $45,314,283,881
USD. We update our BTC to USD price in real-time. Bitcoin is down 3.74% in the
last 24 hours.

2. Bitcoin price today, BTC to USD live price, marketcap and chart |
CoinDesk https://www.coindesk.com/price/bitcoin The price of Bitcoin (BTC)
is <strong>$63,508.01</strong> today as of Feb 24, 2026, 12:15 am EST, with a
24-hour trading volume of $24.62B.

3. Bitcoin BTC (BTC-USD) Live Price, News, Chart & Price History - Yahoo
Finance https://finance.yahoo.com/quote/BTC-USD/ Bitcoin has a current supply
of 19,993,606. The last known price of Bitcoin is <strong>66,178.39249005
USD</strong> and is down -2.15 over the last 24 hours. It is currently trading
on 12564 active market(s) with $39,417,837,545.32 traded over the last 24 hours.

4. Bitcoin Price Chart (BTC/USD) | Bitcoin Value | bitFlyer USA
https://bitflyer.com/en-us/bitcoin-chart Check out the current Bitcoin (BTC)
price, market cap, historical volatility, and buy Bitcoin on bitFlyer today with
<strong>as little as $1</strong>!

5. Bitcoin Price | BTC to USD Converter, Chart and News
https://www.binance.com/en/price/bitcoin Live price of Bitcoin is
<strong>$96,262.14</strong> with a market cap of $1,908.19B USD. Discover
current price, trading volume, chart history, and more.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Constructs the Brave search request URL on the heap (base + path template + URL-encoded query), increases the response buffer from 16 KB to 64 KB, adds an overflow guard and warning, adjusts proxy handling to pass only the path, and improves allocation/error cleanup paths. (≤50 words)

Changes

Cohort / File(s) Summary
Web Search Tool
main/tools/tool_web_search.c
SEARCH_BUF_SIZE raised from 16 KB to 64 KB; added STR/XSTR macros and Brave API base/path constants; URL assembled into a heap buffer (base + path template + encoded query) with OOM handling; added response-overflow warning; proxy mode sends only the path while direct mode uses full URL; reordered input deletion and ensured url-buffer free on all paths.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Tool as Tool (web_search)
  participant Proxy as HTTP Proxy
  participant Remote as Brave API

  User->>Tool: submit query
  Tool->>Tool: build full_url on heap (base + path template + urlencode(query))
  alt using proxy
    Tool->>Proxy: send request with path only
    Proxy->>Remote: forward full URL request
    Remote-->>Proxy: response
    Proxy-->>Tool: response
  else direct
    Tool->>Remote: send request with full_url
    Remote-->>Tool: response
  end
  Tool->>Tool: check response size (overflow guard)
  Tool-->>User: return results or warning
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped a URL, neat and spry,
Heap-woven path beneath the sky,
A bigger net to catch the stream,
Warnings when waters near the brim,
Freed my buffers — off I fly! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix large responses that break web_search tool' directly and clearly summarizes the main objective of the PR, which is to address failures caused by oversized responses from the Brave Web API by increasing the buffer size and filtering results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)

172-180: ⚠️ Potential issue | 🟡 Minor

header[512] stack buffer over-read in search_via_proxy — worsened by the new path parameter.

snprintf always returns the number of characters it would have written, regardless of truncation. With the worst-case inputs:

Component Bytes
Fixed HTTP header text 113
path (max after this PR) 300
s_search_key (max) 127
Total 540

snprintf writes 511 bytes (truncated) but returns hlen = 540. Line 180 then calls proxy_conn_write(conn, header, hlen), instructing it to read 540 bytes from a 512-byte stack array — a 28-byte over-read of adjacent stack memory (UB). Before this PR the worst case was 522 bytes (pre-existing bug); the new &result_filter=web adds 18 bytes, extending the over-read.

The simplest fix is to increase header and clamp hlen:

🛠 Proposed fix
-    char header[512];
+    char header[640];   /* 113 fixed + 300 path + 127 key + margin */
     int hlen = snprintf(header, sizeof(header),
         "GET %s HTTP/1.1\r\n"
         ...
         path, s_search_key);
 
+    if (hlen < 0 || (size_t)hlen >= sizeof(header)) {
+        ESP_LOGW(TAG, "Proxy header truncated (%d bytes)", hlen);
+        hlen = (int)sizeof(header) - 1;
+    }
     if (proxy_conn_write(conn, header, hlen) < 0) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 172 - 180, In search_via_proxy,
snprintf writes into the local header buffer but returns the total bytes that
would have been written (hlen), which can exceed the header array size and lead
to a buffer over-read when calling proxy_conn_write(header, hlen); to fix,
expand header to safely hold the worst-case fixed text + path + s_search_key (or
a conservative larger size) and, after snprintf, check for a negative return and
clamp hlen to at most sizeof(header)-1 (or sizeof(header)) before passing it to
proxy_conn_write; reference the symbols header, hlen, snprintf, s_search_key and
proxy_conn_write when making the change.
🧹 Nitpick comments (1)
main/tools/tool_web_search.c (1)

33-38: Consider logging a warning when the buffer is full and data is dropped.

When needed >= sb->cap the incoming chunk is silently discarded, which surfaces later as a generic "Error: Failed to parse search results" with no indication that the response was truncated. Adding an ESP_LOGW here would make future size regressions immediately diagnosable without any other change to behaviour.

🪵 Suggested diagnostic log
     size_t needed = sb->len + evt->data_len;
     if (needed < sb->cap) {
         memcpy(sb->data + sb->len, evt->data, evt->data_len);
         sb->len += evt->data_len;
         sb->data[sb->len] = '\0';
+    } else {
+        ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf %zu/%zu)",
+                 evt->data_len, sb->len, sb->cap);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 33 - 38, The code currently
silently drops incoming chunks when the string buffer is full in the block that
checks "if (needed < sb->cap)"—add an ESP_LOGW warning in the else branch so we
log when data is truncated; include context such as sb (buffer) capacity,
current sb->len, and evt->data_len (or evt->data_len and the incoming chunk) in
the message to aid debugging (i.e., update the branch around the
memcpy/str-terminate logic in tool_web_search.c where sb and evt are
referenced).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 172-180: In search_via_proxy, snprintf writes into the local
header buffer but returns the total bytes that would have been written (hlen),
which can exceed the header array size and lead to a buffer over-read when
calling proxy_conn_write(header, hlen); to fix, expand header to safely hold the
worst-case fixed text + path + s_search_key (or a conservative larger size) and,
after snprintf, check for a negative return and clamp hlen to at most
sizeof(header)-1 (or sizeof(header)) before passing it to proxy_conn_write;
reference the symbols header, hlen, snprintf, s_search_key and proxy_conn_write
when making the change.

---

Nitpick comments:
In `@main/tools/tool_web_search.c`:
- Around line 33-38: The code currently silently drops incoming chunks when the
string buffer is full in the block that checks "if (needed < sb->cap)"—add an
ESP_LOGW warning in the else branch so we log when data is truncated; include
context such as sb (buffer) capacity, current sb->len, and evt->data_len (or
evt->data_len and the incoming chunk) in the message to aid debugging (i.e.,
update the branch around the memcpy/str-terminate logic in tool_web_search.c
where sb and evt are referenced).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efaf40f and e4a3d7c.

📒 Files selected for processing (1)
  • main/tools/tool_web_search.c

When needed >= sb->cap the incoming chunk is silently discarded, which surfaces
later as a generic "Error: Failed to parse search results" with no indication
that the response was truncated.
@mogenson
Copy link
Copy Markdown
Contributor Author

Nitpick comments:
In @main/tools/tool_web_search.c:

  • Around line 33-38: The code currently silently drops incoming chunks when the
    string buffer is full in the block that checks "if (needed < sb->cap)"—add an
    ESP_LOGW warning in the else branch so we log when data is truncated; include
    context such as sb (buffer) capacity, current sb->len, and evt->data_len (or
    evt->data_len and the incoming chunk) in the message to aid debugging (i.e.,
    update the branch around the memcpy/str-terminate logic in tool_web_search.c
    where sb and evt are referenced).

Suggested warning log applied in 60d4757

@mogenson
Copy link
Copy Markdown
Contributor Author

Reduced the stack usage for non-proxy web search in 8695afd to fix issue #88

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/tools/tool_web_search.c`:
- Around line 258-268: If calloc for url fails, free the parsed cJSON object
(input) before returning and return the OOM error; also stop silently truncating
the encoded query by sizing url based on the actual lengths of s_brave_url,
s_path_template and the query payload rather than a fixed 256 bytes (or
explicitly reject queries that exceed a safe maximum). Concretely: in the
function building the URL, after parsing keep references to query->valuestring,
compute required_size = strlen(s_brave_url) + strlen(s_path_template) +
worst_case_encoded_length(strlen(query->valuestring)) + 1, allocate url with
that size, check calloc failure and call cJSON_Delete(input) before returning
ESP_ERR_NO_MEM, then use memcpy on s_brave_url/s_path_template and call
url_encode into the correctly sized tail; alternatively enforce and validate a
max query length up-front and return an error for too-long inputs.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4a3d7c and 8695afd.

📒 Files selected for processing (1)
  • main/tools/tool_web_search.c

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)

179-186: ⚠️ Potential issue | 🟠 Major

Proxy header buffer (512 B) can be exceeded by long queries, producing truncated HTTP requests.

With s_path_template (~46 chars) + a URL-encoded query (up to 256 chars) + the API key (up to 127 chars) + fixed header text (~90 chars), the formatted header can reach ~543 bytes, exceeding the 512-byte header buffer. snprintf will prevent a memory overwrite but will silently truncate the request, causing a malformed HTTP request and a silent failure.

Consider sizing header dynamically (e.g., based on strlen(path) + sizeof(s_search_key) + 200) or at least increasing the static size to accommodate the worst case.

♻️ Suggested dynamic sizing
-    char header[512];
-    int hlen = snprintf(header, sizeof(header),
+    size_t hdr_cap = strlen(path) + sizeof(s_search_key) + 200;
+    char *header = malloc(hdr_cap);
+    if (!header) {
+        proxy_conn_close(conn);
+        return ESP_ERR_NO_MEM;
+    }
+    int hlen = snprintf(header, hdr_cap,
         "GET %s HTTP/1.1\r\n"
         "Host: api.search.brave.com\r\n"
         "Accept: application/json\r\n"
         "X-Subscription-Token: %s\r\n"
         "Connection: close\r\n\r\n",
         path, s_search_key);

Don't forget to free(header) after proxy_conn_write (and on the write-error path).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 179 - 186, The fixed-size char
header[512] used with snprintf in the GET request can be overflowed/truncated
for long path or API key values (see header, hlen, snprintf using path and
s_search_key); change to allocate the header dynamically sized from strlen(path)
+ strlen(s_search_key) + a safe constant for fixed text (or compute from
s_path_template) so the formatted request cannot be truncated, then use
vsnprintf/write into the heap buffer and ensure you free(header) after
proxy_conn_write and also free on any write/error path; keep hlen semantics
(return value of snprintf/vsnprintf) for error/truncation checking.
🧹 Nitpick comments (1)
main/tools/tool_web_search.c (1)

196-206: Proxy read loop silently drops data without a warning, unlike the direct-mode handler.

The http_event_handler (Line 44) now logs ESP_LOGW when data is truncated, but the proxy read loop here silently discards excess bytes. For consistency, consider adding a similar warning when copy < (size_t)n.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 196 - 206, The proxy read loop
that uses proxy_conn_read silently drops excess bytes when copy < (size_t)n;
update the loop to detect when copy < (size_t)n and emit a warning similar to
http_event_handler (use ESP_LOGW) including context such as the connection
identifier (conn), number of bytes read (n), and number of bytes copied (copy)
so truncated data is visible in logs; keep the existing memcpy/total logic and
only add the conditional warning before or after the memcpy to mirror
http_event_handler behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 179-186: The fixed-size char header[512] used with snprintf in the
GET request can be overflowed/truncated for long path or API key values (see
header, hlen, snprintf using path and s_search_key); change to allocate the
header dynamically sized from strlen(path) + strlen(s_search_key) + a safe
constant for fixed text (or compute from s_path_template) so the formatted
request cannot be truncated, then use vsnprintf/write into the heap buffer and
ensure you free(header) after proxy_conn_write and also free on any write/error
path; keep hlen semantics (return value of snprintf/vsnprintf) for
error/truncation checking.

---

Nitpick comments:
In `@main/tools/tool_web_search.c`:
- Around line 196-206: The proxy read loop that uses proxy_conn_read silently
drops excess bytes when copy < (size_t)n; update the loop to detect when copy <
(size_t)n and emit a warning similar to http_event_handler (use ESP_LOGW)
including context such as the connection identifier (conn), number of bytes read
(n), and number of bytes copied (copy) so truncated data is visible in logs;
keep the existing memcpy/total logic and only add the conditional warning before
or after the memcpy to mirror http_event_handler behavior.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8695afd and a2b85dd.

📒 Files selected for processing (1)
  • main/tools/tool_web_search.c

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)

179-186: ⚠️ Potential issue | 🟡 Minor

Detect proxy header truncation for long queries.

snprintf into a 512‑byte stack buffer can silently truncate when the query (or key) is long, yielding malformed proxy requests. Please detect truncation and fail fast (or allocate dynamically).

🔧 Proposed fix
     char header[512];
     int hlen = snprintf(header, sizeof(header),
         "GET %s HTTP/1.1\r\n"
         "Host: api.search.brave.com\r\n"
         "Accept: application/json\r\n"
         "X-Subscription-Token: %s\r\n"
         "Connection: close\r\n\r\n",
         path, s_search_key);
+    if (hlen < 0 || hlen >= (int)sizeof(header)) {
+        proxy_conn_close(conn);
+        return ESP_ERR_INVALID_SIZE;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 179 - 186, The current snprintf
into the fixed 512-byte stack buffer (header) when building the GET request can
silently truncate long path or s_search_key values; update the code around
snprintf(header, sizeof(header), ...) and hlen to detect truncation (check if
hlen < 0 or hlen >= sizeof(header)) and fail fast (return an error) or switch to
dynamic allocation: call snprintf(NULL, 0, ...) to compute needed length, malloc
the required buffer, then format into it; ensure you reference
header/hlen/snprintf/path/s_search_key and free any allocated memory on error
paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main/tools/tool_web_search.c`:
- Around line 258-270: Compute query_len safely by checking for size_t overflow
before multiplying strlen(query->valuestring) by 3 (e.g., ensure strlen <=
(SIZE_MAX - N)/3) and return an error if it would overflow; allocate url with
extra guard bytes to match url_encode’s internal reserve (add at least 2–3 extra
bytes beyond the computed query expansion) and use the correct total size:
sizeof(s_brave_url) + sizeof(s_path_template) + query_len + GUARD_BYTES; if
allocation fails, handle as before. Ensure you reference and update the
variables query_len, url, and the call to url_encode (and preserve s_brave_url
and s_path_template usage) so the buffer cannot be truncated by url_encode’s
internal assumptions.

---

Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 179-186: The current snprintf into the fixed 512-byte stack buffer
(header) when building the GET request can silently truncate long path or
s_search_key values; update the code around snprintf(header, sizeof(header),
...) and hlen to detect truncation (check if hlen < 0 or hlen >= sizeof(header))
and fail fast (return an error) or switch to dynamic allocation: call
snprintf(NULL, 0, ...) to compute needed length, malloc the required buffer,
then format into it; ensure you reference header/hlen/snprintf/path/s_search_key
and free any allocated memory on error paths.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2b85dd and 8a5f82a.

📒 Files selected for processing (1)
  • main/tools/tool_web_search.c

Comment on lines +258 to +270
size_t query_len = strlen(query->valuestring) * 3 + 1; // worse case URL-encoding expansion
char *url = calloc(1, sizeof(s_brave_url) + sizeof(s_path_template) + query_len);
if (!url) {
cJSON_Delete(input);
snprintf(output, output_size, "Error: Out of memory");
return ESP_ERR_NO_MEM;
}

char path[384];
snprintf(path, sizeof(path),
"/res/v1/web/search?q=%s&count=%d", encoded_query, SEARCH_RESULT_COUNT);
/* Build Query */
memcpy(url, s_brave_url, sizeof(s_brave_url));
memcpy(url + sizeof(s_brave_url) - 1, s_path_template, sizeof(s_path_template));
url_encode(query->valuestring, url + sizeof(s_brave_url) + sizeof(s_path_template) - 2, query_len);
cJSON_Delete(input);
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard URL buffer sizing and avoid worst‑case truncation.

size_t query_len = strlen(...) * 3 + 1 can overflow for very large inputs, and url_encode reserves 3 bytes internally, so the “worst‑case” capacity still truncates by up to 2 bytes. Please add an overflow check and size the buffer to match url_encode’s guard.

🔧 Proposed fix
+#include <stdint.h>
@@
-    size_t query_len = strlen(query->valuestring) * 3 + 1; // worse case URL-encoding expansion
-    char *url = calloc(1, sizeof(s_brave_url) + sizeof(s_path_template) + query_len);
+    size_t raw_len = strlen(query->valuestring);
+    if (raw_len > (SIZE_MAX - sizeof(s_brave_url) - sizeof(s_path_template) - 3) / 3) {
+        cJSON_Delete(input);
+        snprintf(output, output_size, "Error: Query too large");
+        return ESP_ERR_INVALID_ARG;
+    }
+    size_t query_len = raw_len * 3 + 3; // match url_encode headroom
+    size_t url_cap = sizeof(s_brave_url) + sizeof(s_path_template) + query_len;
+    char *url = calloc(1, url_cap);
@@
-    url_encode(query->valuestring, url + sizeof(s_brave_url) + sizeof(s_path_template) - 2, query_len);
+    url_encode(query->valuestring, url + sizeof(s_brave_url) + sizeof(s_path_template) - 2, query_len);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 258 - 270, Compute query_len
safely by checking for size_t overflow before multiplying
strlen(query->valuestring) by 3 (e.g., ensure strlen <= (SIZE_MAX - N)/3) and
return an error if it would overflow; allocate url with extra guard bytes to
match url_encode’s internal reserve (add at least 2–3 extra bytes beyond the
computed query expansion) and use the correct total size: sizeof(s_brave_url) +
sizeof(s_path_template) + query_len + GUARD_BYTES; if allocation fails, handle
as before. Ensure you reference and update the variables query_len, url, and the
call to url_encode (and preserve s_brave_url and s_path_template usage) so the
buffer cannot be truncated by url_encode’s internal assumptions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll increase the number of reserved bytes for url_encode from 1 to 3 and switch strlen to strnlen with a max string length of 256. But I'm not worried at all about a query string length exceeding the size_t limit.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mogenson, that sounds good! Using strnlen with a 256-byte cap and increasing the reserved bytes to 3 effectively addresses the truncation concern while keeping the implementation practical for this embedded context. With that cap in place, the size_t overflow becomes mathematically impossible, so your approach makes sense. 👍

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main/tools/tool_web_search.c (1)

179-188: ⚠️ Potential issue | 🔴 Critical

Critical: proxy header length is unchecked, enabling out-of-bounds read on write.

With Line 285 passing potentially long paths, Line 180 can produce hlen >= sizeof(header). Line 188 then writes hlen bytes from a 512-byte stack buffer, which can read past bounds and leak memory.

🛡️ Safe fix (size-aware header allocation + length check)
 static esp_err_t search_via_proxy(const char *path, search_buf_t *sb)
 {
     proxy_conn_t *conn = proxy_conn_open("api.search.brave.com", 443, 15000);
     if (!conn) return ESP_ERR_HTTP_CONNECT;

-    char header[512];
-    int hlen = snprintf(header, sizeof(header),
+    size_t token_len = strnlen(s_search_key, sizeof(s_search_key));
+    size_t header_cap =
+        strlen("GET  HTTP/1.1\r\nHost: api.search.brave.com\r\nAccept: application/json\r\nX-Subscription-Token: \r\nConnection: close\r\n\r\n")
+        + strlen(path) + token_len + 1;
+    char *header = malloc(header_cap);
+    if (!header) {
+        proxy_conn_close(conn);
+        return ESP_ERR_NO_MEM;
+    }
+    int hlen = snprintf(header, header_cap,
         "GET %s HTTP/1.1\r\n"
         "Host: api.search.brave.com\r\n"
         "Accept: application/json\r\n"
         "X-Subscription-Token: %s\r\n"
         "Connection: close\r\n\r\n",
         path, s_search_key);
+    if (hlen < 0 || (size_t)hlen >= header_cap) {
+        free(header);
+        proxy_conn_close(conn);
+        return ESP_ERR_INVALID_SIZE;
+    }

-    if (proxy_conn_write(conn, header, hlen) < 0) {
+    if (proxy_conn_write(conn, header, (size_t)hlen) < 0) {
+        free(header);
         proxy_conn_close(conn);
         return ESP_ERR_HTTP_WRITE_DATA;
     }
+    free(header);

Also applies to: 285-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 179 - 188, The snprintf call
building header into the fixed-size char header[512] can return hlen >=
sizeof(header) causing proxy_conn_write(conn, header, hlen) to read
out-of-bounds; change to a size-aware safe pattern: detect snprintf return value
and if hlen < 0 or hlen >= sizeof(header) handle as error or allocate a
sufficiently sized buffer (e.g., compute required length and malloc) before
writing, and only call proxy_conn_write with the actual valid byte count; update
references to header, hlen, snprintf, path, s_search_key and proxy_conn_write to
perform the length check and fail or reallocate when the header would be
truncated.
🧹 Nitpick comments (1)
main/tools/tool_web_search.c (1)

43-45: Return an explicit “response too large” error instead of falling through to JSON parse failure.

At Line 43 you drop bytes and only log. That usually turns into a generic parse error later, which obscures the real failure mode.

🔧 Suggested refactor
 typedef struct {
     char *data;
     size_t len;
     size_t cap;
+    bool truncated;
 } search_buf_t;
@@
 static esp_err_t http_event_handler(esp_http_client_event_t *evt)
 {
     search_buf_t *sb = (search_buf_t *)evt->user_data;
     if (evt->event_id == HTTP_EVENT_ON_DATA) {
         size_t needed = sb->len + evt->data_len;
         if (needed < sb->cap) {
             memcpy(sb->data + sb->len, evt->data, evt->data_len);
             sb->len += evt->data_len;
             sb->data[sb->len] = '\0';
         } else {
-            ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf %zu/%zu)",
-                     evt->data_len, sb->len, sb->cap);
+            if (!sb->truncated) {
+                sb->truncated = true;
+                ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf %zu/%zu)",
+                         evt->data_len, sb->len, sb->cap);
+            }
         }
     }
     return ESP_OK;
 }
@@
     if (err != ESP_OK) {
         free(sb.data);
         snprintf(output, output_size, "Error: Search request failed");
         return err;
     }
+    if (sb.truncated) {
+        free(sb.data);
+        snprintf(output, output_size, "Error: Search response exceeded buffer");
+        return ESP_ERR_INVALID_SIZE;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main/tools/tool_web_search.c` around lines 43 - 45, The else branch that
currently just logs via ESP_LOGW(TAG, "Response too large; dropped %d bytes (buf
%zu/%zu)", evt->data_len, sb->len, sb->cap) should instead return an explicit
"response too large" error to the caller and avoid proceeding to JSON parsing;
update the code in the block handling evt->data_len/sb (the branch that drops
bytes) to set/return a clear error value (e.g., an error code or bool/failure
status used by the surrounding function) and include the same descriptive
message (with evt->data_len, sb->len, sb->cap) via ESP_LOGW before returning so
callers can detect and handle this specific failure instead of receiving a
generic JSON parse error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@main/tools/tool_web_search.c`:
- Around line 179-188: The snprintf call building header into the fixed-size
char header[512] can return hlen >= sizeof(header) causing
proxy_conn_write(conn, header, hlen) to read out-of-bounds; change to a
size-aware safe pattern: detect snprintf return value and if hlen < 0 or hlen >=
sizeof(header) handle as error or allocate a sufficiently sized buffer (e.g.,
compute required length and malloc) before writing, and only call
proxy_conn_write with the actual valid byte count; update references to header,
hlen, snprintf, path, s_search_key and proxy_conn_write to perform the length
check and fail or reallocate when the header would be truncated.

---

Nitpick comments:
In `@main/tools/tool_web_search.c`:
- Around line 43-45: The else branch that currently just logs via ESP_LOGW(TAG,
"Response too large; dropped %d bytes (buf %zu/%zu)", evt->data_len, sb->len,
sb->cap) should instead return an explicit "response too large" error to the
caller and avoid proceeding to JSON parsing; update the code in the block
handling evt->data_len/sb (the branch that drops bytes) to set/return a clear
error value (e.g., an error code or bool/failure status used by the surrounding
function) and include the same descriptive message (with evt->data_len, sb->len,
sb->cap) via ESP_LOGW before returning so callers can detect and handle this
specific failure instead of receiving a generic JSON parse error.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a5f82a and bc75b0e.

📒 Files selected for processing (1)
  • main/tools/tool_web_search.c

@IRONICBo
Copy link
Copy Markdown
Member

Thanks for your contribution!

@IRONICBo IRONICBo self-requested a review February 25, 2026 00:41
Copy link
Copy Markdown
Member

@IRONICBo IRONICBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — bumping the buffer to 64K, moving URL construction to heap, and adding the truncation warning are all sensible.

One issue: search_via_proxy still uses char header[512] with snprintf returning hlen, then passes hlen to proxy_conn_write. When hlen >= 512 (possible with longer encoded queries + 127-byte key), this reads past the stack buffer. Please either clamp hlen = MIN(hlen, sizeof(header)-1) or bump header to 768.

Minor: strnlen(query->valuestring, 256) silently caps the query — fine but worth a comment.

Overall correct and a clear improvement.

In non-proxy mode, tool_web_search_execute allocates about 1.2kB on the stack,
which is enough to cause a stack overflow on the console repl task.

Replace the stack allocates for the query, path, and url buffers with a single
heap allocation for a full url with query. Use static string literals with
concatenated preprocessor defines to populate the url buffer. Encode the query
at the end of the path template, since the Brave Web API allows the query
parameter to be in any location.

The proxy web search is even more problematic, since it allocates a 4kB response
buffer on the stack. Fixing the proxy web search is out of scope for this commit.

Before:

```
mimi>  tool_exec web_search "{\"query\": \"Bitcoin price\"}""
I (110154) tools: Executing tool: web_search
I (110154) web_search: Searching: Bitcoin price
***ERROR*** A stack overflow in task console_repl has been detected.
```

After:
```
mimi>  tool_exec web_search "{\"query\": \"Bitcoin price\"}"
I (359774) tools: Executing tool: web_search
I (359774) web_search: Searching: Bitcoin price
I (361114) web_search: Search complete, 1347 bytes result
tool_exec status: ESP_OK
```
The function search_via_proxy still uses char header[512] with snprintf
returning hlen, then passes hlen to proxy_conn_write. When hlen >= 512 (possible
with longer encoded queries + 127-byte key), this reads past the stack buffer.

Increase the header buffer size to 768.
@mogenson
Copy link
Copy Markdown
Contributor Author

@IRONICBo I added a comment about query length being truncated to 256 chars by strnlen().

I added a new CL increasing the search_via_proxy() header buffer to 768 byres. I'm not able to test this change because I do not have a proxy set up. Note, the proxy web search will still definitely overflow the stack because it allocates a 4kB buffer on the stack here:

char tmp[4096];

@mogenson
Copy link
Copy Markdown
Contributor Author

mogenson commented Mar 6, 2026

@IRONICBo is this PR still needed after #114?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants