Skip to content

fix: properly check that only the last message is kept when doing multi-turn conversations#135

Merged
constantinius merged 2 commits intomainfrom
constantinius/fix/multi-turn-last-message-check
Apr 8, 2026
Merged

fix: properly check that only the last message is kept when doing multi-turn conversations#135
constantinius merged 2 commits intomainfrom
constantinius/fix/multi-turn-last-message-check

Conversation

@constantinius
Copy link
Copy Markdown
Collaborator

@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 8, 2026

@constantinius constantinius requested a review from a team April 8, 2026 08:30
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Agent span check fails: no input messages attribute
    • Removed last-input-message validation from checkAgentSpanAttributes so agent spans are no longer incorrectly required to include message attributes that only exist on chat spans.

Create PR

Or push these changes by commenting:

@cursor push 67d2ae9584
Preview (67d2ae9584)
diff --git a/src/test-cases/checks.ts b/src/test-cases/checks.ts
--- a/src/test-cases/checks.ts
+++ b/src/test-cases/checks.ts
@@ -432,7 +432,6 @@
     }
 
     assertAttributes(agentSpans, attrs);
-    assertOnlyLastInputMessage(agentSpans, testDef, "agent");
   },
 };

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🔴 AI SDK Integration Test Results

Status: 3 regressions detected

Summary

Metric main PR Change
Total Tests 667 667
Passed 455 462 +7 ✅
Failed 204 196 -8 ✅

🔴 Regressions

These tests were passing on main but are now failing:

browser/langchain :: Multi-Turn LLM Test (blocking)

Error: Browser test timed out (60s)

Browser test timed out (60s)
browser/openai :: Multi-Turn LLM Test (streaming)

Error: Browser test timed out (60s)

Browser test timed out (60s)
cloudflare/anthropic :: Basic LLM Test (streaming)

Error: Test execution failed: Wrangler exited with code 1

Test execution failed: Wrangler exited with code 1
stdout: 
 ⛅️ wrangler 4.81.0
───────────────────
Using secrets defined in .dev.vars
Your Worker has access to the following bindings:
Binding                                                                    Resource                  Mode
env.SENTRY_DSN ("http://public@localhost:42709/1933769...")                Environment Variable      local
env.RUN_ID ("run-1775639517852-im22rdf")                                   Environment Variable      local
env.OPENAI_API_KEY ("(hidden)")                                            Environment Variable      local
env.ANTHROPIC_API_KEY ("(hidden)")                                         Environment Variable      local
env.GOOGLE_GENAI_API_KEY ("(hidden)")                                      Environment Variable      local

⎔ Starting local server...
*** Fatal uncaught kj::Exception: workerd/util/sqlite.c++:829: failed: SQLite failed; NOSENTRY database is locked: SQLITE_BUSY
stack: /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@324226b /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1fc8645 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@2012d42 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1fd3dfc /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1fd3852 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f92142 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@200d1ef /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@2010545 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f64289 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@5177765 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@5177c88 /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@517574e /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@517554e /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f4bd15 /lib/x86_64-linux-gnu/libc.so.6@2a1c9 /lib/x86_64-linux-gnu/libc.so.6@2a28a /home/runner/work/testing-ai-sdk-integrations/testing-ai-sdk-integrations/runs/cloudflare/anthropic-llm-0.39.0-sentry-latest/node_modules/@cloudflare/workerd-linux-64/bin/workerd@1f4b024

�[32mIf you think this is a bug then please create an issue at https://github.com/cloudflare/workers-sdk/issues/new/choose�[0m
? Would you like to report this error to Cloudflare? Wrangler's output and the error details will be shared with the Wrangler team to help us diagnose and fix the issue.
🤖 Using fallback value in non-interactive context: no

stderr: �[31m✘ �[41;31m[�[41;97mERROR�[41;31m]�[0m �[1mThe Workers runtime failed to start. There is likely additional logging output above.�[0m


🪵  Logs were written to "/home/runner/.config/.wrangler/logs/wrangler-2026-04-08_09-23-37_023.log"

✅ Fixed

These tests were failing on main but are now passing:

  • node/vercel :: Basic Agent Test (streaming, function, openai)
  • cloudflare/langchain :: Conversation ID LLM Test (streaming)
  • cloudflare/langchain :: Conversation ID LLM Test (blocking)
  • cloudflare/openai :: Basic LLM Test (streaming)
  • cloudflare/openai :: Basic LLM Test (blocking)
  • cloudflare/openai :: Basic Error LLM Test (streaming)
  • cloudflare/openai :: Basic Error LLM Test (blocking)
  • cloudflare/openai :: Vision LLM Test (streaming)
  • cloudflare/openai :: Vision LLM Test (blocking)
  • cloudflare/openai :: Long Input LLM Test (streaming)

Test Matrix

Agent Tests

SDK Basic Agent Test Conversation ID Agent Test Long Input Agent Test Tool Call Agent Test Tool Error Agent Test Vision Agent Test
browser/langgraph blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain
cloudflare/langgraph
cloudflare/vercel blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai
nextjs/mastra
nextjs/vercel blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai
node/langgraph
node/manual
node/mastra
node/vercel blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropic ✅🔧str, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai
python/langgraph as as as as as as
python/manual as as as as as as
python/openai-agents
python/pydantic-ai a, fallbacka, single a, fallbacka, single a, fallbacka, single a, fallbacka, single a, fallbacka, single a, fallbacka, single

Embedding Tests

SDK Basic Embeddings Test
browser/google-genai
browser/langchain
browser/openai
cloudflare/google-genai
cloudflare/langchain
cloudflare/openai
cloudflare/vercel
nextjs/google-genai
nextjs/langchain
nextjs/openai
nextjs/vercel
node/google-genai
node/langchain
node/openai
node/vercel
python/google-genai a, blks, blk
python/langchain a, blks, blk
python/litellm a, blks, blk
python/manual a, blks, blk
python/openai a, blks, blk

LLM Tests

SDK Basic Error LLM Test Basic LLM Test Conversation ID LLM Test Long Input LLM Test Multi-Turn LLM Test Vision LLM Test
browser/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
browser/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
browser/langchain blkstr blkstr blkstr blkstr ❌📉blkstr blkstr
browser/openai blkstr blkstr blkstr blkstr blk ❌📉str blkstr
cloudflare/anthropic blkstr blk ❌📉str blkstr blkstr blkstr blkstr
cloudflare/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
cloudflare/langchain blkstr blkstr ✅🔧blk ✅🔧str blkstr blkstr blkstr
cloudflare/openai ✅🔧blk ✅🔧str ✅🔧blk ✅🔧str blkstr blk ✅🔧str blkstr ✅🔧blk ✅🔧str
nextjs/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/langchain blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/openai blkstr blkstr blkstr blkstr blkstr blkstr
node/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
node/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
node/langchain blkstr blkstr blkstr blkstr blkstr blkstr
node/manual
node/openai blkstr blkstr blkstr blkstr blkstr blkstr
python/anthropic a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/google-genai a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/langchain a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/litellm a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/manual a, blks, blk a, blks, blk a, blks, blk a, blks, blk a, blks, blk
python/openai a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str

MCP Tests

SDK Basic MCP Tool Call Test MCP Multiple Tool Calls Test MCP Prompt Get Test MCP Resource Read Test MCP Tool Error Test
node/mcp sseio sseio sseio sseio sseio
python/fastmcp a, blk, ssea, blk, io a, blk, ssea, blk, io a, blk, ssea, blk, io a, blk, ssea, blk, io a, blk, ssea, blk, io
python/mcp a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo

Legend: ✅ Pass | ❌ Fail | ✅🔧 Fixed | ❌📉 Regressed | ✅🆕 New (pass) | ❌🆕 New (fail) | 🗑️ Removed | str=streaming blk=blocking a=async s=sync io=stdio sse=sse hi=highlevel lo=lowlevel


Generated by AI SDK Integration Tests

Comment on lines +298 to +307
if (parsed.messages.length !== 1) {
const message = `${spanType} span ${i} should keep only the last input message, found ${parsed.messages.length} message(s)`;
errors.push(message);
locations.push({
spanId: span.span_id,
attribute: parsed.attribute,
message,
});
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The new assertOnlyLastInputMessage check will cause multi-turn tests to fail for all real SDK integrations because they send full conversation history, not just the last message.
Severity: HIGH

Suggested Fix

Either update the check in assertOnlyLastInputMessage to be less strict and accommodate the standard behavior of SDKs, or apply this check only to the manual test frameworks where this behavior is explicitly being tested. Do not apply it universally to all frameworks.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/test-cases/checks.ts#L298-L307

Potential issue: A new validation function, `assertOnlyLastInputMessage`, was introduced
to enforce that each AI span in a multi-turn test contains exactly one message. While
the manual test templates were updated to comply with this new rule, the templates for
real SDK integrations (like OpenAI, Anthropic, etc.) were not. These SDKs typically
include the full conversation history in each turn. As a result, when a multi-turn test
is run with any non-manual SDK integration, the test will fail because the SDK sends
multiple messages, violating the new check's expectation of a single message.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 75d15a4. Configure here.

const expectedText = getMessageText(expected);

if (actualText !== undefined || expectedText !== undefined) {
return actualText === expectedText;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loose message comparison fails for multimodal content

Low Severity

messagesMatchLoosely would produce false negatives for multimodal messages in multi-turn tests. The actual span message has content: "[multimodal]" (a string, after buildSentryMessages transformation), so getMessageText returns "[multimodal]" immediately. But the expected raw message has content: [{type: "text", text: "..."}, ...] (an array), so getMessageText extracts the real text. These never match, causing assertOnlyLastInputMessage to incorrectly report a failure. No current multi-turn test uses multimodal content, but adding one would trigger this.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 75d15a4. Configure here.

@constantinius constantinius merged commit 14072b4 into main Apr 8, 2026
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants