Skip to content

Conversation

@juanmichelini
Copy link
Collaborator

@juanmichelini juanmichelini commented Dec 23, 2025

Summary

Fixes #1495

This PR adds all the expected models from the index to the allowed models list in .github/run-eval/resolve_model_config.py.

Changes

New Models Added

Model ID LiteLLM Model Path Display Name
claude-4.5-opus litellm_proxy/anthropic/claude-opus-4-5-20251101 Claude 4.5 Opus
claude-4.5-sonnet litellm_proxy/anthropic/claude-sonnet-4-5-20250929 Claude 4.5 Sonnet
gemini-3-pro litellm_proxy/gemini/gemini-3-pro-preview Gemini 3 Pro
gemini-3-flash litellm_proxy/gemini/gemini-3-flash-preview Gemini 3 Flash
gpt-5.2-high-reasoning litellm_proxy/openai/gpt-5.2-pro GPT-5.2 High Reasoning
gpt-5.2 litellm_proxy/openai/gpt-5.2 GPT-5.2
minimax-m2 litellm_proxy/minimax/minimax-m2 MiniMax M2
deepseek-v3.2-reasoner litellm_proxy/deepseek/deepseek-v3.2 DeepSeek V3.2 Reasoner
qwen-3-coder litellm_proxy/qwen/qwen3-coder Qwen 3 Coder

Note: kimi-k2-thinking was already present in the configuration.

Test Updates

  • Fixed the test file to import from the correct module (resolve_model_config instead of resolve_model_configs)
  • Updated tests to match the current implementation (using global MODELS dictionary)
  • Added new tests to verify all expected models are present and correctly configured

Testing

All tests pass:

tests/github_workflows/test_resolve_model_config.py::test_find_models_by_id_single_model PASSED
tests/github_workflows/test_resolve_model_config.py::test_find_models_by_id_multiple_models PASSED
tests/github_workflows/test_resolve_model_config.py::test_find_models_by_id_preserves_order PASSED
tests/github_workflows/test_resolve_model_config.py::test_find_models_by_id_missing_model_exits PASSED
tests/github_workflows/test_resolve_model_config.py::test_find_models_by_id_empty_list PASSED
tests/github_workflows/test_resolve_model_config.py::test_find_models_by_id_preserves_full_config PASSED
tests/github_workflows/test_resolve_model_config.py::test_all_expected_models_present PASSED
tests/github_workflows/test_resolve_model_config.py::test_expected_models_have_required_fields PASSED
tests/github_workflows/test_resolve_model_config.py::test_expected_models_id_matches_key PASSED
tests/github_workflows/test_resolve_model_config.py::test_find_all_expected_models PASSED

@juanmichelini can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a3f8318-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a3f8318-python \
  ghcr.io/openhands/agent-server:a3f8318-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a3f8318-golang-amd64
ghcr.io/openhands/agent-server:a3f8318-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a3f8318-golang-arm64
ghcr.io/openhands/agent-server:a3f8318-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a3f8318-java-amd64
ghcr.io/openhands/agent-server:a3f8318-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a3f8318-java-arm64
ghcr.io/openhands/agent-server:a3f8318-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a3f8318-python-amd64
ghcr.io/openhands/agent-server:a3f8318-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:a3f8318-python-arm64
ghcr.io/openhands/agent-server:a3f8318-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:a3f8318-golang
ghcr.io/openhands/agent-server:a3f8318-java
ghcr.io/openhands/agent-server:a3f8318-python

About Multi-Architecture Support

  • Each variant tag (e.g., a3f8318-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a3f8318-python-amd64) are also available if needed

Add the following models to the allowed models list for workflows:
- claude-4.5-opus (litellm_proxy/anthropic/claude-opus-4-5-20251101)
- claude-4.5-sonnet (litellm_proxy/anthropic/claude-sonnet-4-5-20250929)
- gemini-3-pro (litellm_proxy/gemini/gemini-3-pro-preview)
- gemini-3-flash (litellm_proxy/gemini/gemini-3-flash-preview)
- gpt-5.2-high-reasoning (litellm_proxy/openai/gpt-5.2-pro)
- gpt-5.2 (litellm_proxy/openai/gpt-5.2)
- minimax-m2 (litellm_proxy/minimax/minimax-m2)
- deepseek-v3.2-reasoner (litellm_proxy/deepseek/deepseek-v3.2)
- qwen-3-coder (litellm_proxy/qwen/qwen3-coder)

Also fix and update tests to match the current implementation.

Co-authored-by: openhands <openhands@all-hands.dev>
"temperature": 0.0,
},
},
"claude-4.5-sonnet": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this model is already in the dictionary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the unpinned version, which highlights that it added most versions unpinned. I'll ask it to update it

@juanmichelini
Copy link
Collaborator Author

@OpenHands please look at MODELS = {...} inside resolve_model_config.py

Some of the models are pinned like "claude-sonnet-4-5-20250929" notice it ends in a date, others are not like "claude-4.5-opus" (notice no date)

For each of the models already added, if they are unpinned lets check if there is a pinned version.
That is a version that has the exact same name but with a date suffix.
Check if there is a pinned version of the model in https://github.com/BerriAI/litellm/
If so let's use that pinned version of the model.

Notes:

  • only update unpinned models, leave already pinned models as they are.
  • when updating an unpinned model if there are two versions of the pinned model, use the most recent one.
  • notice that we want to pin the model, so only a date suffix will be added. the name should not change except for the date.
  • the pinned model must exist in https://github.com/BerriAI/litellm/ otherwise leave the model unpinned.

@openhands-ai
Copy link

openhands-ai bot commented Dec 23, 2025

I'm on it! juanmichelini can track my progress at all-hands.dev

- gpt-5.2-pro -> gpt-5.2-pro-2025-12-11
- gpt-5.2 -> gpt-5.2-2025-12-11

These are the only models that have pinned versions available in litellm.
Other models (deepseek-chat, kimi-k2-thinking, gemini-3-pro-preview,
gemini-3-flash-preview, minimax-m2, deepseek-v3.2, qwen3-coder) do not
have pinned versions in litellm and are left unchanged.

Also fixed test file to use correct model ID (claude-sonnet-4-5-20250929
instead of claude-4.5-sonnet).

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Dec 23, 2025

I've analyzed all models in the MODELS dictionary in resolve_model_config.py and checked for pinned versions in the litellm repository.

Summary of Changes

Models Pinned (2 models updated):

Model Before After
gpt-5.2-pro litellm_proxy/openai/gpt-5.2-pro litellm_proxy/openai/gpt-5.2-pro-2025-12-11
gpt-5.2 litellm_proxy/openai/gpt-5.2 litellm_proxy/openai/gpt-5.2-2025-12-11

Models Left Unchanged (no pinned version exists in litellm):

  • deepseek-chat - no pinned version available
  • kimi-k2-thinking - no pinned version available
  • gemini-3-pro-preview - no pinned version available
  • gemini-3-flash-preview - no pinned version available
  • minimax-m2 - no pinned version available
  • deepseek-v3.2 - no pinned version available
  • qwen3-coder - no pinned version available

Already Pinned Models (no changes needed):

  • claude-sonnet-4-5-20250929
  • claude-haiku-4-5-20251001
  • gpt-5-mini-2025-08-07
  • claude-opus-4-5-20251101 (in llm_config)

Additional Fix:

Fixed a pre-existing test issue where EXPECTED_MODELS referenced claude-4.5-sonnet but the actual model ID in MODELS is claude-sonnet-4-5-20250929.

Checklist:

  • ✅ Only updated unpinned models
  • ✅ Only added date suffix (name unchanged except for date)
  • ✅ Pinned versions verified to exist in litellm
  • ✅ Used most recent pinned version (2025-12-11)
  • ✅ All tests pass
  • ✅ Pre-commit hooks pass
  • ✅ Changes pushed to PR branch

View full conversation

"gpt-5.2-high-reasoning": {
"id": "gpt-5.2-high-reasoning",
"display_name": "GPT-5.2 High Reasoning",
"llm_config": {"model": "litellm_proxy/openai/gpt-5.2-pro-2025-12-11"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought "GPT 5.2 Pro" is not the same with "GPT 5.2 with reasoning='high'", is it?

Pricing is very different too; Pro is apparently $21 / input, $168 / output
while GPT-5.2 for any reasoning set, is $1.75 / $14

I'm not sure if it's available at all via AH proxy. I can access it via API with a ChatGPT subscription though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More details, to my knowledge, in slack

Please do correct me if wrong

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @juanmichelini, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add desired index models to list of allowed models in workflow

6 participants