fix: properly check that only the last message is kept when doing multi-turn conversations by constantinius · Pull Request #135 · getsentry/testing-ai-sdk-integrations

constantinius · 2026-04-08T08:30:06Z

Closes https://linear.app/getsentry/issue/TET-2158/multi-turn-extend-check-to-assert-correct-message-popping

…ti-turn conversations

linear-code · 2026-04-08T08:30:11Z

TET-2158 Multi-turn: extend check to assert correct message popping

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Agent span check fails: no input messages attribute
- Removed last-input-message validation from checkAgentSpanAttributes so agent spans are no longer incorrectly required to include message attributes that only exist on chat spans.

Or push these changes by commenting:

@cursor push 67d2ae9584

Preview (67d2ae9584)

diff --git a/src/test-cases/checks.ts b/src/test-cases/checks.ts
--- a/src/test-cases/checks.ts
+++ b/src/test-cases/checks.ts
@@ -432,7 +432,6 @@
     }
 
     assertAttributes(agentSpans, attrs);
-    assertOnlyLastInputMessage(agentSpans, testDef, "agent");
   },
 };

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

src/test-cases/checks.ts

github-actions · 2026-04-08T08:53:35Z

🔴 AI SDK Integration Test Results

Status: 3 regressions detected

Summary

Metric	main	PR	Change
Total Tests	667	667	—
Passed	455	462	+7 ✅
Failed	204	196	-8 ✅

🔴 Regressions

These tests were passing on main but are now failing:

browser/langchain :: Multi-Turn LLM Test (blocking)

Error: Browser test timed out (60s)

Browser test timed out (60s)

browser/openai :: Multi-Turn LLM Test (streaming)

Error: Browser test timed out (60s)

Browser test timed out (60s)

cloudflare/anthropic :: Basic LLM Test (streaming)

Error: Test execution failed: Wrangler exited with code 1

Test execution failed: Wrangler exited with code 1
stdout: 
 ⛅️ wrangler 4.81.0
───────────────────
Using secrets defined in .dev.vars
Your Worker has access to the following bindings:
Binding                                                                    Resource                  Mode
env.SENTRY_DSN ("http://public@localhost:42709/1933769...")                Environment Variable      local
env.RUN_ID ("run-1775639517852-im22rdf")                                   Environment Variable      local
env.OPENAI_API_KEY ("(hidden)")                                            Environment Variable      local
env.ANTHROPIC_API_KEY ("(hidden)")                                         Environment Variable      local
env.GOOGLE_GENAI_API_KEY ("(hidden)")                                      Environment Variable      local

⎔ Starting local server...
*** Fatal uncaught kj::Exception: workerd/util/sqlite.c++:829: failed: SQLite failed; NOSENTRY database is locked: SQLITE_BUSY
stack: /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@324226b /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1fc8645 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@2012d42 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1fd3dfc /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1fd3852 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f92142 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@200d1ef /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@2010545 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f64289 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@5177765 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@5177c88 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@517574e /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@517554e /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f4bd15 /lib/x86_64-linux-gnu/libc.so.6@2a1c9 /lib/x86_64-linux-gnu/libc.so.6@2a28a /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f4b024

�[32mIf you think this is a bug then please create an issue at https://github.com/cloudflare/workers-sdk/issues/new/choose�[0m
? Would you like to report this error to Cloudflare? Wrangler's output and the error details will be shared with the Wrangler team to help us diagnose and fix the issue.
🤖 Using fallback value in non-interactive context: no

stderr: �[31m✘ �[41;31m[�[41;97mERROR�[41;31m]�[0m �[1mThe Workers runtime failed to start. There is likely additional logging output above.�[0m


🪵  Logs were written to "/home/runner/.config/.wrangler/logs/wrangler-2026-04-08_09-23-37_023.log"

✅ Fixed

These tests were failing on main but are now passing:

node/vercel :: Basic Agent Test (streaming, function, openai)
cloudflare/langchain :: Conversation ID LLM Test (streaming)
cloudflare/langchain :: Conversation ID LLM Test (blocking)
cloudflare/openai :: Basic LLM Test (streaming)
cloudflare/openai :: Basic LLM Test (blocking)
cloudflare/openai :: Basic Error LLM Test (streaming)
cloudflare/openai :: Basic Error LLM Test (blocking)
cloudflare/openai :: Vision LLM Test (streaming)
cloudflare/openai :: Vision LLM Test (blocking)
cloudflare/openai :: Long Input LLM Test (streaming)

Test Matrix

Agent Tests

SDK	Basic Agent Test	Conversation ID Agent Test	Long Input Agent Test	Tool Call Agent Test	Tool Error Agent Test	Vision Agent Test
browser/langgraph	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}
cloudflare/langgraph	❌	❌	❌	❌	❌	❌
cloudflare/vercel	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}
nextjs/mastra	✅	❌	—	✅	✅	✅
nextjs/vercel	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}
node/langgraph	❌	❌	❌	❌	❌	❌
node/manual	✅	✅	✅	✅	✅	✅
node/mastra	✅	❌	❌	❌	❌	❌
node/vercel	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅🔧_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}	❌_{blk, class, anthropic} ❌_{blk, class, openai} ✅_{blk, function, anthropic} ✅_{blk, function, openai} ❌_{str, class, anthropic} ❌_{str, class, openai} ✅_{str, function, anthropic} ✅_{str, function, openai}
python/langgraph	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s
python/manual	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s
python/openai-agents	✅	✅	✅	✅	✅	❌
python/pydantic-ai	✅_{a, fallback} ✅_{a, single}	✅_{a, fallback} ✅_{a, single}	✅_{a, fallback} ✅_{a, single}	✅_{a, fallback} ✅_{a, single}	✅_{a, fallback} ✅_{a, single}	✅_{a, fallback} ✅_{a, single}

Embedding Tests

SDK	Basic Embeddings Test
browser/google-genai	✅
browser/langchain	❌
browser/openai	✅
cloudflare/google-genai	✅
cloudflare/langchain	❌
cloudflare/openai	✅
cloudflare/vercel	✅
nextjs/google-genai	✅
nextjs/langchain	❌
nextjs/openai	✅
nextjs/vercel	✅
node/google-genai	✅
node/langchain	❌
node/openai	✅
node/vercel	✅
python/google-genai	✅_{a, blk} ✅_{s, blk}
python/langchain	❌_{a, blk} ❌_{s, blk}
python/litellm	❌_{a, blk} ✅_{s, blk}
python/manual	✅_{a, blk} ✅_{s, blk}
python/openai	✅_{a, blk} ✅_{s, blk}

LLM Tests

SDK	Basic Error LLM Test	Basic LLM Test	Conversation ID LLM Test	Long Input LLM Test	Multi-Turn LLM Test	Vision LLM Test
browser/anthropic	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
browser/google-genai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
browser/langchain	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	❌📉_blk ✅_str	✅_blk ✅_str
browser/openai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ❌📉_str	✅_blk ✅_str
cloudflare/anthropic	✅_blk ✅_str	✅_blk ❌📉_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
cloudflare/google-genai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
cloudflare/langchain	✅_blk ✅_str	✅_blk ✅_str	✅🔧_blk ✅🔧_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
cloudflare/openai	✅🔧_blk ✅🔧_str	✅🔧_blk ✅🔧_str	✅_blk ✅_str	✅_blk ✅🔧_str	✅_blk ✅_str	✅🔧_blk ✅🔧_str
nextjs/anthropic	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
nextjs/google-genai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
nextjs/langchain	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
nextjs/openai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
node/anthropic	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
node/google-genai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
node/langchain	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
node/manual	—	✅	✅	✅	✅	✅
node/openai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
python/anthropic	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}
python/google-genai	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}
python/langchain	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}
python/litellm	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}
python/manual	—	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}
python/openai	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}

MCP Tests

SDK	Basic MCP Tool Call Test	MCP Multiple Tool Calls Test	MCP Prompt Get Test	MCP Resource Read Test	MCP Tool Error Test
node/mcp	✅_sse ✅_io	✅_sse ✅_io	✅_sse ✅_io	✅_sse ✅_io	✅_sse ✅_io
python/fastmcp	✅_{a, blk, sse} ✅_{a, blk, io}	✅_{a, blk, sse} ✅_{a, blk, io}	✅_{a, blk, sse} ✅_{a, blk, io}	✅_{a, blk, sse} ✅_{a, blk, io}	✅_{a, blk, sse} ✅_{a, blk, io}
python/mcp	✅_{a, blk, sse, hi} ✅_{a, blk, sse, lo} ✅_{a, blk, io, hi} ✅_{a, blk, io, lo}	✅_{a, blk, sse, hi} ✅_{a, blk, sse, lo} ✅_{a, blk, io, hi} ✅_{a, blk, io, lo}	✅_{a, blk, sse, hi} ✅_{a, blk, sse, lo} ✅_{a, blk, io, hi} ✅_{a, blk, io, lo}	✅_{a, blk, sse, hi} ✅_{a, blk, sse, lo} ✅_{a, blk, io, hi} ✅_{a, blk, io, lo}	✅_{a, blk, sse, hi} ✅_{a, blk, sse, lo} ✅_{a, blk, io, hi} ✅_{a, blk, io, lo}

Generated by AI SDK Integration Tests

sentry · 2026-04-08T09:14:05Z

src/test-cases/checks.ts

+    if (parsed.messages.length !== 1) {
+      const message = `${spanType} span ${i} should keep only the last input message, found ${parsed.messages.length} message(s)`;
+      errors.push(message);
+      locations.push({
+        spanId: span.span_id,
+        attribute: parsed.attribute,
+        message,
+      });
+      continue;
+    }


Bug: The new assertOnlyLastInputMessage check will cause multi-turn tests to fail for all real SDK integrations because they send full conversation history, not just the last message.
_{Severity: HIGH}

Suggested Fix

Either update the check in assertOnlyLastInputMessage to be less strict and accommodate the standard behavior of SDKs, or apply this check only to the manual test frameworks where this behavior is explicitly being tested. Do not apply it universally to all frameworks.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/test-cases/checks.ts#L298-L307 Potential issue: A new validation function, `assertOnlyLastInputMessage`, was introduced to enforce that each AI span in a multi-turn test contains exactly one message. While the manual test templates were updated to comply with this new rule, the templates for real SDK integrations (like OpenAI, Anthropic, etc.) were not. These SDKs typically include the full conversation history in each turn. As a result, when a multi-turn test is run with any non-manual SDK integration, the test will fail because the SDK sends multiple messages, violating the new check's expectation of a single message.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 75d15a4. Configure here.}

cursor · 2026-04-08T09:16:09Z

src/test-cases/checks.ts

+  const expectedText = getMessageText(expected);
+
+  if (actualText !== undefined || expectedText !== undefined) {
+    return actualText === expectedText;


Loose message comparison fails for multimodal content

Low Severity

messagesMatchLoosely would produce false negatives for multimodal messages in multi-turn tests. The actual span message has content: "[multimodal]" (a string, after buildSentryMessages transformation), so getMessageText returns "[multimodal]" immediately. But the expected raw message has content: [{type: "text", text: "..."}, ...] (an array), so getMessageText extracts the real text. These never match, causing assertOnlyLastInputMessage to incorrectly report a failure. No current multi-turn test uses multimodal content, but adding one would trigger this.

Additional Locations (1)

src/test-cases/checks.ts#L164-L167

^{Reviewed by Cursor Bugbot for commit 75d15a4. Configure here.}

fix: properly check that only the last message is kept when doing mul…

ef8274d

…ti-turn conversations

constantinius requested a review from a team April 8, 2026 08:30

cursor bot reviewed Apr 8, 2026

View reviewed changes

src/test-cases/checks.ts Show resolved Hide resolved

fix: manual instrumentation test fixes

75d15a4

sentry bot reviewed Apr 8, 2026

View reviewed changes

cursor bot reviewed Apr 8, 2026

View reviewed changes

vgrozdanic approved these changes Apr 8, 2026

View reviewed changes

constantinius merged commit 14072b4 into main Apr 8, 2026
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: properly check that only the last message is kept when doing multi-turn conversations#135

fix: properly check that only the last message is kept when doing multi-turn conversations#135
constantinius merged 2 commits intomainfrom
constantinius/fix/multi-turn-last-message-check

constantinius commented Apr 8, 2026

Uh oh!

linear-code bot commented Apr 8, 2026

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

sentry bot Apr 8, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

constantinius commented Apr 8, 2026

Uh oh!

linear-code bot commented Apr 8, 2026

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 AI SDK Integration Test Results

Summary

🔴 Regressions

✅ Fixed

Test Matrix

Agent Tests

Embedding Tests

LLM Tests

MCP Tests

Uh oh!

sentry bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 8, 2026

Choose a reason for hiding this comment

Loose message comparison fails for multimodal content

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cursor bot left a comment •

edited

Loading

github-actions bot commented Apr 8, 2026 •

edited

Loading