Skip to content

fix: replace blunt pop with assistant message rewriting in _separate_tool_calls#5303

Open
jaideepr97 wants to merge 3 commits intollamastack:mainfrom
jaideepr97:fix/approval-pop-history-corruption
Open

fix: replace blunt pop with assistant message rewriting in _separate_tool_calls#5303
jaideepr97 wants to merge 3 commits intollamastack:mainfrom
jaideepr97:fix/approval-pop-history-corruption

Conversation

@jaideepr97
Copy link
Copy Markdown
Contributor

This patch was generated by Claude.

Summary

Fixes #5301

_separate_tool_calls() called next_turn_messages.pop() inside the tool call loop, once per tool call needing approval. With N such tool calls, pop() fired N times — the first correctly removed the assistant message, but subsequent pops destroyed unrelated conversation messages (user message, system message, etc.).

Additionally, even in the single-tool-call case, the pop removed the entire assistant message including tool calls that were approved and executed — leaving orphaned tool results in the conversation history and model context.

Fix

Replace the per-tool-call pop with intelligent assistant message handling based on what was actually executed:

  • All deferred/denied: pop the assistant message entirely (no executed tool calls to preserve context for)
  • Mixed (some executed, some deferred/denied): replace the assistant message with a new one containing only the executed tool calls, keeping the conversation history coherent with the tool results that follow
  • All executed: leave the assistant message untouched

Test plan

  • 10 regression tests covering all three cases (all-deferred, mixed, all-executed)
  • Verified original messages (system, user) are never corrupted regardless of number of tool calls
  • Full responses unit test suite passes (234 tests)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 25, 2026
@jaideepr97
Copy link
Copy Markdown
Contributor Author

@grs Please take a look at this when you get a second, would appreciate your inputs!

@@ -0,0 +1,348 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new file, can you put your test case to tests/unit/providers/inline/responses/builtin/responses/test_streaming.py ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks!

@gyliu513
Copy link
Copy Markdown
Contributor

@leseb already do some review for #5294, hope he can resume the review in this PR.

…ltiple MCP approval tool calls

The _separate_tool_calls() method called next_turn_messages.pop() inside
the tool call loop, once per tool call needing approval. With N tool calls
requiring approval, pop() fired N times — the first correctly removed the
assistant message, but subsequent pops destroyed unrelated conversation
messages (user message, system message, etc.).

Track whether a pop is needed with a boolean flag and execute it at most
once after the tool call loop completes.

Fixes: llamastack#5301
Signed-off-by: Jaideep Rao <jrao@redhat.com>
Made-with: Cursor
…tool_calls

The _separate_tool_calls() method called next_turn_messages.pop() inside
the tool call loop, once per tool call needing approval. With N such tool
calls, pop() fired N times — the first correctly removed the assistant
message, but subsequent pops destroyed unrelated conversation messages.

Replace the per-tool-call pop with intelligent assistant message handling:
- All deferred/denied: pop the assistant message entirely (no tool calls
  to preserve context for)
- Mixed (some executed, some deferred): replace the assistant message with
  one containing only the executed tool calls, keeping the conversation
  history coherent with the tool results that follow
- All executed: leave the assistant message untouched

Fixes: llamastack#5301
Signed-off-by: Jaideep Rao <jrao@redhat.com>
Made-with: Cursor
Consolidate the regression tests for the pop() bug fix from the standalone
test_approval_pop_bug.py into the existing test_streaming.py file per
reviewer feedback.

Signed-off-by: Jaideep Rao <jrao@redhat.com>
Made-with: Cursor
Signed-off-by: Jaideep Rao <jrao@redhat.com>
@jaideepr97 jaideepr97 force-pushed the fix/approval-pop-history-corruption branch from 68c8651 to 7ac4102 Compare March 27, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

next_turn_messages.pop() in _separate_tool_calls corrupts conversation history when multiple MCP tool calls require approval

3 participants