-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Summary
Add optional reasoning_content field to LLMResultChunkDelta to support structured streaming of LLM reasoning/thinking processes, separate from the final response text.
This change is part of a three-repository update (sdk, plugin, and main),
with a safe, phased rollout plan detailed below to ensure backward compatibility.
Follow-up to #23313
Motivation
Current inefficient flow (error-prone):
- Some vendor APIs return reasoning already separated (e.g.,
delta.thinking) - Some plugins artificially merge it into
message.contentusing<think>tags - The Dify Main backend re-parses those
<think>tags via_split_reasoning()to separate them again - Risk of parsing errors, encoding issues, nested tags
Problems:
- Unnecessary serialization/deserialization overhead
- Parsing bugs (malformed tags, edge cases)
- The Dify Main backend already supports the reasoning_content structure (#23313),
but currently relies on fragile parsing through _split_reasoning() because the SDK lacks this field and plugin support.
Proposed flow:
- Vendor API returns separated data → Plugin preserves separation
- Plugin sends via
reasoning_contentfield (no merging) - Main reads directly (no parsing)
- Efficient, type-safe, maintainable, less error-prone separation
This aligns the SDK with how vendor APIs actually work.
Proposed Changes
Phase 1: SDK (this issue)
class LLMResultChunkDelta(BaseModel):
index: int
message: AssistantPromptMessage
reasoning_content: str | None = None # ← NEW
usage: LLMUsage | None = None
finish_reason: str | None = NoneCompatibility: Backward-compatible (optional field, default None)
Phase 2: Plugins (defensive)
New plugin versions will enforce SDK versions that include this PR or later (via requirements.txt)
Add defensive pop() before Dify Main Backend sends the flag (Phase 3).
This prevents vendor API errors if an older plugin receives updated parameters from the Main backend.
def _chat_generate(self, model_parameters, ...):
model_parameters = dict(model_parameters)
model_parameters.pop("_dify_supports_reasoning_content", False) # defensive
# ... call vendor APIAffected plugins: All LLM plugins
Phase 3: Dify Main Backend (read & capability signaling)
Main reads reasoning_content from plugin responses:
- Use
delta.reasoning_contentdirectly if present
Main sends capability flag via model_parameters:
- Plugins detect this and optimize (single-channel vs dual-channel streaming)
Phase 4: Plugins (opt-in implementation)
Provider-by-provider implementation for reasoning-capable models:
My Plan
- SDK PR
- Plugin PR (Phase 2 defensive)
- Main PR (Phase 3)
- Plugin PRs (Phase 4 opt-in) → Gradual rollout per provider
Backward Compatibility
No breaking changes at any phase
- Old plugins work with new SDK (field ignored)
- New plugins work with old Main (dual-channel fallback)
- Old Main works with new plugins (defensive pop)
Migration Safety
This phased approach ensures zero-risk migration despite independent release cycles:
no coordination required between repositories.
Why this is safe:
- No LLM node disruption: All phases use optional fields/parameters
- SDK-independent: Plugins can upgrade SDK without waiting for Main
- Main-independent: SDK can release without breaking existing deployments
- Plugin-independent: Each provider can opt in at its own pace
Version compatibility matrix:
| Scenario | SDK | Main | Plugin | Result |
|---|---|---|---|---|
| Old stack | Old | Old | Old | Works (current flow) |
| SDK updated | New | Old | Old | Works (field ignored) |
| SDK + Plugin | New | Old | New | Works (defensive pop) |
| SDK + Main | New | New | Old | Works (Main reads field if present) |
| All updated | New | New | New | Optimal (direct reasoning flow) |