Skip to content

ENG-2922 Token usage tracking in model responses and agent progress#890

Merged
aix-ahmet merged 4 commits intodevelopmentfrom
ENG-2922-Add-token-usage-in-llm-calls-in-SDK
Apr 1, 2026
Merged

ENG-2922 Token usage tracking in model responses and agent progress#890
aix-ahmet merged 4 commits intodevelopmentfrom
ENG-2922-Add-token-usage-in-llm-calls-in-SDK

Conversation

@aix-ahmet
Copy link
Copy Markdown
Collaborator

Summary

  • Model response token usage: Add usage (prompt/completion/total tokens) and asset fields to model response handling in both V1 and V2, with robust parsing that gracefully handles NaN, null, and string values from inconsistent backend responses.
  • Fix model.run() hanging: Resolve issue where model.run() on sync-only models would return IN_PROGRESS without polling, and where NaN token values caused deserialization errors leading to infinite retry loops.
  • Agent progress token display: Show input/output/total tokens inline on each step line at all verbosity levels, with aggregated totals in the completion summary.
  • Backend documentation: Include curl commands documenting token reporting inconsistencies across different LLM providers for the backend team.

Test plan

  • All 911 unit tests pass
  • Pre-commit hooks pass (ruff lint, ruff format, pytest)
  • Manual verification: run an agent with progress_verbosity=1,2,3 and confirm token counts appear on step lines
  • Manual verification: run check_llm_usage.py to confirm token parsing for various LLM backends

Made with Cursor

Surface token usage (prompt_tokens, completion_tokens, total_tokens)
and asset info from model serving as first-class fields on model
responses. Fix V2 poll path which previously dropped usage and asset
from the filtered response dict for async models.

Also fix pre-existing broken tests: remove test_action_inputs_proxy.py
(imports removed ActionInputsProxy class) and fix subagents -> agents
assertion in test_v2_agent_duplicate.py.
@aix-ahmet aix-ahmet force-pushed the ENG-2922-Add-token-usage-in-llm-calls-in-SDK branch from 4055df1 to e691f72 Compare April 1, 2026 21:54
Several model providers (GPT-5.4, Claude, Mistral Large) return
"NaN"/null for token counts in the usage block. This caused:
1. Usage dataclass deserialization to fail (int("NaN"))
2. sync_poll to retry the same completed response forever until timeout
3. Sync-only model.run() to return IN_PROGRESS without polling

Changes:
- Make Usage fields Optional[int] with a safe decoder that handles
  NaN, null, strings, and floats gracefully
- Add poll fallback in resource.poll() for completed responses that
  fail deserialization
- Add polling after _run_sync_v2() for sync models that return a
  poll URL

Made-with: Cursor
Display input/output/total tokens inline on each step line and
aggregate totals in the completion summary. Also includes curl
commands documenting backend token reporting inconsistencies.

Made-with: Cursor
@aix-ahmet aix-ahmet force-pushed the ENG-2922-Add-token-usage-in-llm-calls-in-SDK branch from e691f72 to b1530cc Compare April 1, 2026 21:56
@aix-ahmet aix-ahmet merged commit a0ca7eb into development Apr 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant