Updating agent benchmarking to latest by shreyasXplain · Pull Request #886 · aixplain/aiXplain

shreyasXplain · 2026-03-31T13:31:33Z

No description provided.

* ENG-2886 Fixed dataclass formation causing a bug * ENG-2886 Fixed dataclass formation causing a bug review

…hitecture (#874) * docs: refresh README positioning, diagrams, and v2 quickstart * docs: refine README copy and update team-agent diagram * docs: mention built-in opt-in agent memory * Update README positioning and examples * Refine README intro copy * Remove OpenClaw from README * Refine README positioning copy * Simplify README deployment wording * Refresh README hero and MCP marketplace section --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

The Debugger agent fails with FAILED status because run_response_generation defaults to False (set in ENG-2855). Without response generation, the backend webhook cannot return the agent output, causing a silent failure. The Debugger requires response generation to synthesise its debugging analysis, so default it to True via setdefault. Made-with: Cursor Co-authored-by: JP Maia <maiajp2305@gmail.com>

* Add pre-commit CI workflow and fix two failing unit tests - Add .github/workflows/pre-commit.yaml to run pre-commit checks on all branches - Fix v1 import in v2/core.py by using sys.modules lookup instead of direct import - Fix test_api_key_validation to clear both TEAM_API_KEY and AIXPLAIN_API_KEY env vars * Add pre-commit CI workflow and fix all pre-commit violations - Add .github/workflows/pre-commit.yaml to run checks on all branches - Scope ruff lint/format to aixplain/v2/ only - Exclude docs/ from trailing-whitespace and end-of-file-fixer - Fix v1 import in v2/core.py (use sys.modules instead of direct import) - Fix test_api_key_validation to clear both API key env vars - Fix trailing whitespace and end-of-file issues across the repo * Remove duplicate pull_request trigger from pre-commit workflow * Add coverage to CI workflow dependencies * Set dummy TEAM_API_KEY in CI for v1 unit test collection * Use real TEAM_API_KEY secret in pre-commit CI workflow * Move 8 tests that make real API calls from unit to functional tests

* ENG-2836 Agent cloning introduced * ENG-2836 Agent cloning functional tests * ENG-2836 addressed feedbacks * ENG-2836 rename clone_subagents to duplicate_subagents for consistent naming Backend payload key remains "cloneSubagents" as required by the API. * update gitignore --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

#848) * ENG-2847 fix the ActionInputsProxy to properly extract and coerce default values * ENG-2847 minor fix --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

* Add usage and asset fields to model response (V1 + V2) Surface token usage (prompt_tokens, completion_tokens, total_tokens) and asset info from model serving as first-class fields on model responses. Fix V2 poll path which previously dropped usage and asset from the filtered response dict for async models. Also fix pre-existing broken tests: remove test_action_inputs_proxy.py (imports removed ActionInputsProxy class) and fix subagents -> agents assertion in test_v2_agent_duplicate.py. * Remove unused mock import from test_action_inputs_proxy.py

* each actions spec retrieved + attributes -> list * removed slug fallback * revert to dict * fixed deleted params

* ENG-2891 Tool saving foundation * Fix tool reconnect: send empty name to avoid "Name already exists" error The backend connect endpoint has a bug where it checks name uniqueness against the tool itself during reconnect (with assetId). Without name, it fails with a trim() error; with the tool's current name, it fails with "Name already exists". Sending name="" satisfies the trim() call while avoiding the uniqueness conflict. The metadata PUT handles the actual name/description updates. Also includes: - Rename parent_model_id to integration_id for clarity - Add integration_path convenience property - Fix _extract_auth_scheme to handle attributes as dict (matches backend) - Clear config/code after successful create/update to prevent false reconnects - Update unit and functional tests Related: ENG-2891, BUG-732 Made-with: Cursor --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

* added available action * use cached self.actions --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

* Change default model for agents to GPT-5.4 * Fix stale model name references after GPT-5.4 default LLM update Made-with: Cursor --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

* Add usage and asset fields to model response (V1 + V2) Surface token usage (prompt_tokens, completion_tokens, total_tokens) and asset info from model serving as first-class fields on model responses. Fix V2 poll path which previously dropped usage and asset from the filtered response dict for async models. Also fix pre-existing broken tests: remove test_action_inputs_proxy.py (imports removed ActionInputsProxy class) and fix subagents -> agents assertion in test_v2_agent_duplicate.py. * Remove unused mock import from test_action_inputs_proxy.py

…890) * Add usage and asset fields to model response (V1 + V2) Surface token usage (prompt_tokens, completion_tokens, total_tokens) and asset info from model serving as first-class fields on model responses. Fix V2 poll path which previously dropped usage and asset from the filtered response dict for async models. Also fix pre-existing broken tests: remove test_action_inputs_proxy.py (imports removed ActionInputsProxy class) and fix subagents -> agents assertion in test_v2_agent_duplicate.py. * Remove unused mock import from test_action_inputs_proxy.py * ENG-2922 Fix model.run() hanging on NaN usage tokens from backend Several model providers (GPT-5.4, Claude, Mistral Large) return "NaN"/null for token counts in the usage block. This caused: 1. Usage dataclass deserialization to fail (int("NaN")) 2. sync_poll to retry the same completed response forever until timeout 3. Sync-only model.run() to return IN_PROGRESS without polling Changes: - Make Usage fields Optional[int] with a safe decoder that handles NaN, null, strings, and floats gracefully - Add poll fallback in resource.poll() for completed responses that fail deserialization - Add polling after _run_sync_v2() for sync models that return a poll URL Made-with: Cursor * ENG-2922 Show token usage in agent progress at all verbosity levels Display input/output/total tokens inline on each step line and aggregate totals in the completion summary. Also includes curl commands documenting backend token reporting inconsistencies. Made-with: Cursor

kadirpekel and others added 15 commits March 19, 2026 21:39

ENG-2886 Fixed dataclass formation causing a bug (#870)

1dea6c2

* ENG-2886 Fixed dataclass formation causing a bug * ENG-2886 Fixed dataclass formation causing a bug review

ENG-2880 Deprecate subagents param and replace with agents param (#871)

ac64b25

ENG-2847 fix the ActionInputsProxy to properly extract and coerce def… (

3b6a63f

#848) * ENG-2847 fix the ActionInputsProxy to properly extract and coerce default values * ENG-2847 minor fix --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

Eng 2840 implement rlms (#869)

f32e80f

each actions spec retrieved + attributes -> list (#883)

632b214

* each actions spec retrieved + attributes -> list * removed slug fallback * revert to dict * fixed deleted params

added available action (#887)

8d29034

* added available action * use cached self.actions --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

Change default model for agents to GPT-5.4 (#888)

e83814c

* Change default model for agents to GPT-5.4 * Fix stale model name references after GPT-5.4 default LLM update Made-with: Cursor --------- Co-authored-by: aix-ahmet <ahmet.gunduz@aixplain.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating agent benchmarking to latest #886

Updating agent benchmarking to latest #886
shreyasXplain wants to merge 15 commits intoagent-benchmarking-v0.1from
development

shreyasXplain commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

shreyasXplain commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants