fix: make tool call assertions flexible to prevent CI flakes#5295
Open
iamemilio wants to merge 2 commits intollamastack:mainfrom
Open
fix: make tool call assertions flexible to prevent CI flakes#5295iamemilio wants to merge 2 commits intollamastack:mainfrom
iamemilio wants to merge 2 commits intollamastack:mainfrom
Conversation
Some models return multiple parallel tool calls for a single-tool prompt, which is a valid API response. The previous assertions required exactly one function call, causing intermittent CI failures when models produced logically correct but duplicated tool invocations. This relaxes the assertions to accept one or more function calls and responds to all of them in follow-up turns, preventing the "tool_call_id not responded to" error that occurs when only the first call is acknowledged. Signed-off-by: Emilio Garcia <i.am.emilio@gmail.com> Made-with: Cursor
mattf
reviewed
Mar 25, 2026
Collaborator
mattf
left a comment
There was a problem hiding this comment.
can you pass parallel_tool_calls=False to create instead?
it'd be good to have a clear parallel_tool_calls=True test too. historically many models would fail it, but we should behave correctly. especially important is parallel_tool_calls=True and stream=True.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
len(response.output) == 1assertions in tool call integration tests to accept one or more function calls (>= 1), since models may legitimately return multiple parallel tool calls for a single-tool prompttool_call_id not responded toAPI error when the model produces duplicate invocationsMotivation
When evaluating cheaper model alternatives for the Azure CI suite, we observed that some models (e.g. gpt-4.1-nano) occasionally return two parallel
get_weatherfunction calls instead of one. This is a valid API response — the model is correctly identifying the tool to call, just being eager about parallelism. The previous== 1assertions treated this as a failure, creating intermittent CI flakes.The fix is backwards-compatible: tests still pass when a model returns exactly one call (since
1 >= 1), while also accepting the multi-call case. The core behavior being verified (correct tool invocation, successful follow-up round-trip, final message with text) remains unchanged.Test plan
responsessuite withgpt-4.1-nanovia OpenAI — 196 passed, 0 failed, 26 skippedtest_function_call_output_list_text,test_function_call_output_list_text_multi_block,test_response_non_streaming_custom_tool) pass consistently with the loosened assertions