Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion plans/343/model-facade-overhaul-plan-step-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,10 @@ Implementation expectations:
- Fallback mode: chat-completion image extraction for autoregressive models
4. Parse usage from provider response if present.
5. Normalize tool calls and reasoning fields.
- Reasoning extraction must check both `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical) and `message.reasoning_content` (legacy/LiteLLM-normalized fallback), with `reasoning` taking precedence when both are present.
- This dual-field check should live in a shared helper in `parsing.py` so it is reusable across adapters.
- Internal canonical field remains `reasoning_content`; no downstream contract change.
- See: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)
6. Normalize image outputs from either `b64_json`, data URI, or URL download.

### Image routing ownership contract
Expand Down Expand Up @@ -947,7 +951,7 @@ OpenAI-compatible response parsing:

1. `choices[0].message.content` -> canonical `message.content`
2. `choices[0].message.tool_calls[*]` -> canonical `ToolCall`
3. `choices[0].message.reasoning_content` if present -> canonical `reasoning_content`
3. `choices[0].message.reasoning` **or** `choices[0].message.reasoning_content` -> canonical `reasoning_content` (`reasoning` takes precedence as the vLLM >= 0.16.0 canonical field; `reasoning_content` is the legacy/LiteLLM fallback)
4. `usage.prompt_tokens/completion_tokens` -> canonical `Usage`

### Canonical -> Anthropic messages payload
Expand Down Expand Up @@ -1347,6 +1351,10 @@ Per adapter:
5. retry behavior tests
6. adaptive throttling behavior tests (drop on 429, gradual recovery)
7. auth status mapping tests (`401 -> AUTHENTICATION`, `403 -> PERMISSION_DENIED`)
8. reasoning field migration tests:
- response with only `message.reasoning` (no `reasoning_content`) populates canonical `reasoning_content`
- response with only `message.reasoning_content` still works (backward compat)
- response with both fields uses `reasoning` (precedence rule)

Tools:

Expand Down Expand Up @@ -1516,6 +1524,15 @@ Mitigation:
1. central retry module with deterministic tests
2. preserve current defaults from `LiteLLMRouterDefaultKwargs`

### Risk: silent reasoning trace loss after LiteLLM removal

Mitigation:

1. vLLM >= 0.16.0 uses `message.reasoning` as canonical field; `reasoning_content` is deprecated/backward-compat. LiteLLM currently normalizes this for us, masking the gap.
2. Shared reasoning extraction helper in `parsing.py` checks `reasoning` first (canonical), falling back to `reasoning_content` (legacy).
3. Adapter unit tests cover all three cases (only `reasoning`, only `reasoning_content`, both present).
4. Ref: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)

### Risk: throttle oscillation or starvation under bursty load

Mitigation:
Expand Down