fix: properly check that only the last message is kept when doing multi-turn conversations#135
Conversation
…ti-turn conversations
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Agent span check fails: no input messages attribute
- Removed last-input-message validation from
checkAgentSpanAttributesso agent spans are no longer incorrectly required to include message attributes that only exist on chat spans.
- Removed last-input-message validation from
Or push these changes by commenting:
@cursor push 67d2ae9584
Preview (67d2ae9584)
diff --git a/src/test-cases/checks.ts b/src/test-cases/checks.ts
--- a/src/test-cases/checks.ts
+++ b/src/test-cases/checks.ts
@@ -432,7 +432,6 @@
}
assertAttributes(agentSpans, attrs);
- assertOnlyLastInputMessage(agentSpans, testDef, "agent");
},
};This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
🔴 AI SDK Integration Test ResultsStatus: 3 regressions detected Summary
🔴 RegressionsThese tests were passing on main but are now failing: browser/langchain :: Multi-Turn LLM Test (blocking)Error: Browser test timed out (60s) browser/openai :: Multi-Turn LLM Test (streaming)Error: Browser test timed out (60s) cloudflare/anthropic :: Basic LLM Test (streaming)Error: Test execution failed: Wrangler exited with code 1 ✅ FixedThese tests were failing on main but are now passing:
Test MatrixAgent Tests
Embedding Tests
LLM Tests
MCP Tests
Legend: ✅ Pass | ❌ Fail | ✅🔧 Fixed | ❌📉 Regressed | ✅🆕 New (pass) | ❌🆕 New (fail) | 🗑️ Removed | str=streaming blk=blocking a=async s=sync io=stdio sse=sse hi=highlevel lo=lowlevel Generated by AI SDK Integration Tests |
| if (parsed.messages.length !== 1) { | ||
| const message = `${spanType} span ${i} should keep only the last input message, found ${parsed.messages.length} message(s)`; | ||
| errors.push(message); | ||
| locations.push({ | ||
| spanId: span.span_id, | ||
| attribute: parsed.attribute, | ||
| message, | ||
| }); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Bug: The new assertOnlyLastInputMessage check will cause multi-turn tests to fail for all real SDK integrations because they send full conversation history, not just the last message.
Severity: HIGH
Suggested Fix
Either update the check in assertOnlyLastInputMessage to be less strict and accommodate the standard behavior of SDKs, or apply this check only to the manual test frameworks where this behavior is explicitly being tested. Do not apply it universally to all frameworks.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/test-cases/checks.ts#L298-L307
Potential issue: A new validation function, `assertOnlyLastInputMessage`, was introduced
to enforce that each AI span in a multi-turn test contains exactly one message. While
the manual test templates were updated to comply with this new rule, the templates for
real SDK integrations (like OpenAI, Anthropic, etc.) were not. These SDKs typically
include the full conversation history in each turn. As a result, when a multi-turn test
is run with any non-manual SDK integration, the test will fail because the SDK sends
multiple messages, violating the new check's expectation of a single message.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 75d15a4. Configure here.
| const expectedText = getMessageText(expected); | ||
|
|
||
| if (actualText !== undefined || expectedText !== undefined) { | ||
| return actualText === expectedText; |
There was a problem hiding this comment.
Loose message comparison fails for multimodal content
Low Severity
messagesMatchLoosely would produce false negatives for multimodal messages in multi-turn tests. The actual span message has content: "[multimodal]" (a string, after buildSentryMessages transformation), so getMessageText returns "[multimodal]" immediately. But the expected raw message has content: [{type: "text", text: "..."}, ...] (an array), so getMessageText extracts the real text. These never match, causing assertOnlyLastInputMessage to incorrectly report a failure. No current multi-turn test uses multimodal content, but adding one would trigger this.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 75d15a4. Configure here.



Closes https://linear.app/getsentry/issue/TET-2158/multi-turn-extend-check-to-assert-correct-message-popping