ENG-2922 Token usage tracking in model responses and agent progress#890
Merged
aix-ahmet merged 4 commits intodevelopmentfrom Apr 1, 2026
Merged
Conversation
Surface token usage (prompt_tokens, completion_tokens, total_tokens) and asset info from model serving as first-class fields on model responses. Fix V2 poll path which previously dropped usage and asset from the filtered response dict for async models. Also fix pre-existing broken tests: remove test_action_inputs_proxy.py (imports removed ActionInputsProxy class) and fix subagents -> agents assertion in test_v2_agent_duplicate.py.
4055df1 to
e691f72
Compare
Several model providers (GPT-5.4, Claude, Mistral Large) return
"NaN"/null for token counts in the usage block. This caused:
1. Usage dataclass deserialization to fail (int("NaN"))
2. sync_poll to retry the same completed response forever until timeout
3. Sync-only model.run() to return IN_PROGRESS without polling
Changes:
- Make Usage fields Optional[int] with a safe decoder that handles
NaN, null, strings, and floats gracefully
- Add poll fallback in resource.poll() for completed responses that
fail deserialization
- Add polling after _run_sync_v2() for sync models that return a
poll URL
Made-with: Cursor
Display input/output/total tokens inline on each step line and aggregate totals in the completion summary. Also includes curl commands documenting backend token reporting inconsistencies. Made-with: Cursor
e691f72 to
b1530cc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
usage(prompt/completion/total tokens) andassetfields to model response handling in both V1 and V2, with robust parsing that gracefully handlesNaN,null, and string values from inconsistent backend responses.model.run()on sync-only models would returnIN_PROGRESSwithout polling, and whereNaNtoken values caused deserialization errors leading to infinite retry loops.Test plan
progress_verbosity=1,2,3and confirm token counts appear on step linescheck_llm_usage.pyto confirm token parsing for various LLM backendsMade with Cursor