From 82570a6123650ce105a9fe271b44311b31de5fbc Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Fri, 6 Mar 2026 09:49:04 -0700 Subject: [PATCH 1/2] update plan 1 --- .../343/model-facade-overhaul-plan-step-1.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/plans/343/model-facade-overhaul-plan-step-1.md b/plans/343/model-facade-overhaul-plan-step-1.md index b932cca75..1b3ad07ab 100644 --- a/plans/343/model-facade-overhaul-plan-step-1.md +++ b/plans/343/model-facade-overhaul-plan-step-1.md @@ -455,6 +455,10 @@ Implementation expectations: - Fallback mode: chat-completion image extraction for autoregressive models 4. Parse usage from provider response if present. 5. Normalize tool calls and reasoning fields. + - Reasoning extraction must check both `message.reasoning_content` (legacy/LiteLLM-normalized) and `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical), with `reasoning_content` taking precedence when both are present. + - This dual-field check should live in a shared helper in `parsing.py` so it is reusable across adapters. + - Internal canonical field remains `reasoning_content`; no downstream contract change. + - See: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374) 6. Normalize image outputs from either `b64_json`, data URI, or URL download. ### Image routing ownership contract @@ -947,7 +951,7 @@ OpenAI-compatible response parsing: 1. `choices[0].message.content` -> canonical `message.content` 2. `choices[0].message.tool_calls[*]` -> canonical `ToolCall` -3. `choices[0].message.reasoning_content` if present -> canonical `reasoning_content` +3. `choices[0].message.reasoning_content` **or** `choices[0].message.reasoning` -> canonical `reasoning_content` (`reasoning_content` takes precedence; `reasoning` is the vLLM >= 0.16.0 field name) 4. `usage.prompt_tokens/completion_tokens` -> canonical `Usage` ### Canonical -> Anthropic messages payload @@ -1347,6 +1351,10 @@ Per adapter: 5. retry behavior tests 6. adaptive throttling behavior tests (drop on 429, gradual recovery) 7. auth status mapping tests (`401 -> AUTHENTICATION`, `403 -> PERMISSION_DENIED`) +8. reasoning field migration tests: + - response with only `message.reasoning` (no `reasoning_content`) populates canonical `reasoning_content` + - response with only `message.reasoning_content` still works (backward compat) + - response with both fields uses `reasoning_content` (precedence rule) Tools: @@ -1516,6 +1524,15 @@ Mitigation: 1. central retry module with deterministic tests 2. preserve current defaults from `LiteLLMRouterDefaultKwargs` +### Risk: silent reasoning trace loss after LiteLLM removal + +Mitigation: + +1. vLLM >= 0.16.0 uses `message.reasoning` as canonical field; `reasoning_content` is deprecated/backward-compat. LiteLLM currently normalizes this for us, masking the gap. +2. Shared reasoning extraction helper in `parsing.py` checks both `reasoning_content` and `reasoning` with explicit precedence. +3. Adapter unit tests cover all three cases (only `reasoning`, only `reasoning_content`, both present). +4. Ref: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374) + ### Risk: throttle oscillation or starvation under bursty load Mitigation: From 26b03e9292bfde80684c39c863a5e34d2d8de903 Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Fri, 6 Mar 2026 11:33:30 -0700 Subject: [PATCH 2/2] flip precedence --- plans/343/model-facade-overhaul-plan-step-1.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/plans/343/model-facade-overhaul-plan-step-1.md b/plans/343/model-facade-overhaul-plan-step-1.md index 1b3ad07ab..56f95cfe5 100644 --- a/plans/343/model-facade-overhaul-plan-step-1.md +++ b/plans/343/model-facade-overhaul-plan-step-1.md @@ -455,7 +455,7 @@ Implementation expectations: - Fallback mode: chat-completion image extraction for autoregressive models 4. Parse usage from provider response if present. 5. Normalize tool calls and reasoning fields. - - Reasoning extraction must check both `message.reasoning_content` (legacy/LiteLLM-normalized) and `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical), with `reasoning_content` taking precedence when both are present. + - Reasoning extraction must check both `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical) and `message.reasoning_content` (legacy/LiteLLM-normalized fallback), with `reasoning` taking precedence when both are present. - This dual-field check should live in a shared helper in `parsing.py` so it is reusable across adapters. - Internal canonical field remains `reasoning_content`; no downstream contract change. - See: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374) @@ -951,7 +951,7 @@ OpenAI-compatible response parsing: 1. `choices[0].message.content` -> canonical `message.content` 2. `choices[0].message.tool_calls[*]` -> canonical `ToolCall` -3. `choices[0].message.reasoning_content` **or** `choices[0].message.reasoning` -> canonical `reasoning_content` (`reasoning_content` takes precedence; `reasoning` is the vLLM >= 0.16.0 field name) +3. `choices[0].message.reasoning` **or** `choices[0].message.reasoning_content` -> canonical `reasoning_content` (`reasoning` takes precedence as the vLLM >= 0.16.0 canonical field; `reasoning_content` is the legacy/LiteLLM fallback) 4. `usage.prompt_tokens/completion_tokens` -> canonical `Usage` ### Canonical -> Anthropic messages payload @@ -1354,7 +1354,7 @@ Per adapter: 8. reasoning field migration tests: - response with only `message.reasoning` (no `reasoning_content`) populates canonical `reasoning_content` - response with only `message.reasoning_content` still works (backward compat) - - response with both fields uses `reasoning_content` (precedence rule) + - response with both fields uses `reasoning` (precedence rule) Tools: @@ -1529,7 +1529,7 @@ Mitigation: Mitigation: 1. vLLM >= 0.16.0 uses `message.reasoning` as canonical field; `reasoning_content` is deprecated/backward-compat. LiteLLM currently normalizes this for us, masking the gap. -2. Shared reasoning extraction helper in `parsing.py` checks both `reasoning_content` and `reasoning` with explicit precedence. +2. Shared reasoning extraction helper in `parsing.py` checks `reasoning` first (canonical), falling back to `reasoning_content` (legacy). 3. Adapter unit tests cover all three cases (only `reasoning`, only `reasoning_content`, both present). 4. Ref: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)