From 82570a6123650ce105a9fe271b44311b31de5fbc Mon Sep 17 00:00:00 2001
From: Nabin Mulepati <nmulepati@nvidia.com>
Date: Fri, 6 Mar 2026 09:49:04 -0700
Subject: [PATCH 1/2] update plan 1

---
 .../343/model-facade-overhaul-plan-step-1.md  | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/plans/343/model-facade-overhaul-plan-step-1.md b/plans/343/model-facade-overhaul-plan-step-1.md
index b932cca75..1b3ad07ab 100644
--- a/plans/343/model-facade-overhaul-plan-step-1.md
+++ b/plans/343/model-facade-overhaul-plan-step-1.md
@@ -455,6 +455,10 @@ Implementation expectations:
    - Fallback mode: chat-completion image extraction for autoregressive models
 4. Parse usage from provider response if present.
 5. Normalize tool calls and reasoning fields.
+   - Reasoning extraction must check both `message.reasoning_content` (legacy/LiteLLM-normalized) and `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical), with `reasoning_content` taking precedence when both are present.
+   - This dual-field check should live in a shared helper in `parsing.py` so it is reusable across adapters.
+   - Internal canonical field remains `reasoning_content`; no downstream contract change.
+   - See: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)
 6. Normalize image outputs from either `b64_json`, data URI, or URL download.
 
 ### Image routing ownership contract
@@ -947,7 +951,7 @@ OpenAI-compatible response parsing:
 
 1. `choices[0].message.content` -> canonical `message.content`
 2. `choices[0].message.tool_calls[*]` -> canonical `ToolCall`
-3. `choices[0].message.reasoning_content` if present -> canonical `reasoning_content`
+3. `choices[0].message.reasoning_content` **or** `choices[0].message.reasoning` -> canonical `reasoning_content` (`reasoning_content` takes precedence; `reasoning` is the vLLM >= 0.16.0 field name)
 4. `usage.prompt_tokens/completion_tokens` -> canonical `Usage`
 
 ### Canonical -> Anthropic messages payload
@@ -1347,6 +1351,10 @@ Per adapter:
 5. retry behavior tests
 6. adaptive throttling behavior tests (drop on 429, gradual recovery)
 7. auth status mapping tests (`401 -> AUTHENTICATION`, `403 -> PERMISSION_DENIED`)
+8. reasoning field migration tests:
+   - response with only `message.reasoning` (no `reasoning_content`) populates canonical `reasoning_content`
+   - response with only `message.reasoning_content` still works (backward compat)
+   - response with both fields uses `reasoning_content` (precedence rule)
 
 Tools:
 
@@ -1516,6 +1524,15 @@ Mitigation:
 1. central retry module with deterministic tests
 2. preserve current defaults from `LiteLLMRouterDefaultKwargs`
 
+### Risk: silent reasoning trace loss after LiteLLM removal
+
+Mitigation:
+
+1. vLLM >= 0.16.0 uses `message.reasoning` as canonical field; `reasoning_content` is deprecated/backward-compat. LiteLLM currently normalizes this for us, masking the gap.
+2. Shared reasoning extraction helper in `parsing.py` checks both `reasoning_content` and `reasoning` with explicit precedence.
+3. Adapter unit tests cover all three cases (only `reasoning`, only `reasoning_content`, both present).
+4. Ref: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)
+
 ### Risk: throttle oscillation or starvation under bursty load
 
 Mitigation:

From 26b03e9292bfde80684c39c863a5e34d2d8de903 Mon Sep 17 00:00:00 2001
From: Nabin Mulepati <nmulepati@nvidia.com>
Date: Fri, 6 Mar 2026 11:33:30 -0700
Subject: [PATCH 2/2] flip precedence

---
 plans/343/model-facade-overhaul-plan-step-1.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/plans/343/model-facade-overhaul-plan-step-1.md b/plans/343/model-facade-overhaul-plan-step-1.md
index 1b3ad07ab..56f95cfe5 100644
--- a/plans/343/model-facade-overhaul-plan-step-1.md
+++ b/plans/343/model-facade-overhaul-plan-step-1.md
@@ -455,7 +455,7 @@ Implementation expectations:
    - Fallback mode: chat-completion image extraction for autoregressive models
 4. Parse usage from provider response if present.
 5. Normalize tool calls and reasoning fields.
-   - Reasoning extraction must check both `message.reasoning_content` (legacy/LiteLLM-normalized) and `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical), with `reasoning_content` taking precedence when both are present.
+   - Reasoning extraction must check both `message.reasoning` (vLLM >= 0.16.0 / OpenAI-compatible canonical) and `message.reasoning_content` (legacy/LiteLLM-normalized fallback), with `reasoning` taking precedence when both are present.
    - This dual-field check should live in a shared helper in `parsing.py` so it is reusable across adapters.
    - Internal canonical field remains `reasoning_content`; no downstream contract change.
    - See: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)
@@ -951,7 +951,7 @@ OpenAI-compatible response parsing:
 
 1. `choices[0].message.content` -> canonical `message.content`
 2. `choices[0].message.tool_calls[*]` -> canonical `ToolCall`
-3. `choices[0].message.reasoning_content` **or** `choices[0].message.reasoning` -> canonical `reasoning_content` (`reasoning_content` takes precedence; `reasoning` is the vLLM >= 0.16.0 field name)
+3. `choices[0].message.reasoning` **or** `choices[0].message.reasoning_content` -> canonical `reasoning_content` (`reasoning` takes precedence as the vLLM >= 0.16.0 canonical field; `reasoning_content` is the legacy/LiteLLM fallback)
 4. `usage.prompt_tokens/completion_tokens` -> canonical `Usage`
 
 ### Canonical -> Anthropic messages payload
@@ -1354,7 +1354,7 @@ Per adapter:
 8. reasoning field migration tests:
    - response with only `message.reasoning` (no `reasoning_content`) populates canonical `reasoning_content`
    - response with only `message.reasoning_content` still works (backward compat)
-   - response with both fields uses `reasoning_content` (precedence rule)
+   - response with both fields uses `reasoning` (precedence rule)
 
 Tools:
 
@@ -1529,7 +1529,7 @@ Mitigation:
 Mitigation:
 
 1. vLLM >= 0.16.0 uses `message.reasoning` as canonical field; `reasoning_content` is deprecated/backward-compat. LiteLLM currently normalizes this for us, masking the gap.
-2. Shared reasoning extraction helper in `parsing.py` checks both `reasoning_content` and `reasoning` with explicit precedence.
+2. Shared reasoning extraction helper in `parsing.py` checks `reasoning` first (canonical), falling back to `reasoning_content` (legacy).
 3. Adapter unit tests cover all three cases (only `reasoning`, only `reasoning_content`, both present).
 4. Ref: [GitHub issue #374](https://github.com/NVIDIA-NeMo/DataDesigner/issues/374)