-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Sjshi763/issue4363 fix:当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化 #4546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
b0a98ad
ab31cd1
c7f27a4
e13ca9c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -247,18 +247,63 @@ async def _parse_openai_completion( | |
|
|
||
| # parse the text completion | ||
| if choice.message.content is not None: | ||
| # text completion | ||
| completion_text = str(choice.message.content).strip() | ||
| # specially, some providers may set <think> tags around reasoning content in the completion text, | ||
| # we use regex to remove them, and store then in reasoning_content field | ||
| reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL) | ||
| matches = reasoning_pattern.findall(completion_text) | ||
| if matches: | ||
| llm_response.reasoning_content = "\n".join( | ||
| [match.strip() for match in matches], | ||
| ) | ||
| completion_text = reasoning_pattern.sub("", completion_text).strip() | ||
| llm_response.result_chain = MessageChain().message(completion_text) | ||
| # content can be either a plain string or a multimodal list | ||
| content = choice.message.content | ||
| # handle multimodal content returned as a list of parts | ||
| if isinstance(content, list): | ||
| reasoning_parts = [] | ||
| mc = MessageChain() | ||
| for part in content: | ||
| if not isinstance(part, dict): | ||
| # fallback: append as plain text | ||
| mc.message(str(part)) | ||
| continue | ||
| ptype = part.get("type") | ||
| if ptype == "text": | ||
|
Comment on lines
+253
to
+262
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (bug_risk): 多模态响应中的文本部分现在不会再去除 标签,也不会填充 reasoning_content,这与纯字符串路径的行为不一致。 在多模态的 建议实现如下: # content can be either a plain string or a multimodal list
content = choice.message.content
# handle multimodal content returned as a list of parts
if isinstance(content, list):
reasoning_parts = []
mc = MessageChain()
for part in content:
if not isinstance(part, dict):
# fallback: append as plain text
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
text = part.get("text", "") or ""
if text:
# extract <think>...</think> segments into reasoning_parts
try:
import re # local import in case module-level import is not present
think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
if think_matches:
reasoning_parts.extend(think_matches)
# strip all <think> blocks from the visible completion text
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
except Exception:
# on any failure, fall back to using the raw text
pass
mc.message(text)
elif ptype == "image_url":
image_field = part.get("image_url")
url = None
if isinstance(image_field, dict):
url = image_field.get("url")
else:
url = image_field
if url:
# 
if isinstance(url, str) and "base64," in url:
base64_data = url.split("base64,", 1)[1]
mc.base64_image(base64_data)
Original comment in Englishsuggestion (bug_risk): Text parts in multimodal responses no longer strip tags or populate reasoning_content, unlike the string path. In the multimodal Suggested implementation: # content can be either a plain string or a multimodal list
content = choice.message.content
# handle multimodal content returned as a list of parts
if isinstance(content, list):
reasoning_parts = []
mc = MessageChain()
for part in content:
if not isinstance(part, dict):
# fallback: append as plain text
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
text = part.get("text", "") or ""
if text:
# extract <think>...</think> segments into reasoning_parts
try:
import re # local import in case module-level import is not present
think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
if think_matches:
reasoning_parts.extend(think_matches)
# strip all <think> blocks from the visible completion text
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
except Exception:
# on any failure, fall back to using the raw text
pass
mc.message(text)
elif ptype == "image_url":
image_field = part.get("image_url")
url = None
if isinstance(image_field, dict):
url = image_field.get("url")
else:
url = image_field
if url:
# 
if isinstance(url, str) and "base64," in url:
base64_data = url.split("base64,", 1)[1]
mc.base64_image(base64_data)
|
||
| mc.message(part.get("text", "")) | ||
| elif ptype == "image_url": | ||
| image_field = part.get("image_url") | ||
| url = None | ||
| if isinstance(image_field, dict): | ||
| url = image_field.get("url") | ||
| else: | ||
| url = image_field | ||
| if url: | ||
| #  | ||
| if isinstance(url, str) and "base64," in url: | ||
| base64_data = url.split("base64,", 1)[1] | ||
| mc.base64_image(base64_data) | ||
| elif isinstance(url, str) and url.startswith("base64://"): | ||
| mc.base64_image(url.replace("base64://", "")) | ||
| else: | ||
| mc.url_image(url) | ||
| elif ptype == "think": | ||
| # collect reasoning parts for later extraction | ||
| think_val = part.get("think") | ||
| if think_val: | ||
| reasoning_parts.append(str(think_val)) | ||
| else: | ||
| # unknown part type, append its textual representation | ||
| mc.message(json.dumps(part, ensure_ascii=False)) | ||
|
|
||
| if reasoning_parts: | ||
| llm_response.reasoning_content = "\n".join( | ||
| [rp.strip() for rp in reasoning_parts] | ||
| ) | ||
| llm_response.result_chain = mc | ||
| else: | ||
| # text completion (string) | ||
| completion_text = str(content).strip() | ||
| # specially, some providers may set <think> tags around reasoning content in the completion text, | ||
| # we use regex to remove them, and store then in reasoning_content field | ||
| reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL) | ||
| matches = reasoning_pattern.findall(completion_text) | ||
| if matches: | ||
| llm_response.reasoning_content = "\n".join( | ||
| [match.strip() for match in matches], | ||
| ) | ||
| completion_text = reasoning_pattern.sub("", completion_text).strip() | ||
| llm_response.result_chain = MessageChain().message(completion_text) | ||
|
|
||
| # parse the reasoning content if any | ||
| # the priority is higher than the <think> tag extraction | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): 建议把多模态内容处理、图片处理以及
<think>解析拆分到独立的辅助函数中,让_parse_openai_completion主要负责协调调用。你可以保持当前的新功能,但通过将多模态解析以及图片/推理处理抽取为几个小的 helper 函数来降低复杂度。这样
_parse_openai_completion就主要做流程编排,分支嵌套会明显减少。例如,可以把多模态处理抽取出来:
将图片 URL 处理再单独封装,以避免主流程中的嵌套条件判断:
再为
<think>标签处理写一个小 helper:这样
_parse_openai_completion就可以简化为:这样既保留了所有行为(多模态、文本、
<think>标签、不同图片格式、未知类型分支),又把复杂的分支逻辑移出了核心解析函数。Original comment in English
issue (complexity): Consider extracting the multimodal content, image handling, and
<think>parsing into dedicated helper functions so_parse_openai_completionmainly orchestrates them.You can keep the new functionality but reduce complexity by pulling the multimodal parsing and image/reasoning handling into small helpers. That makes
_parse_openai_completionmostly orchestration and flattens the nesting.For example, extract the multimodal handling:
Factor image URL handling separately to avoid nested conditionals in the main logic:
And a small helper for the
<think>-tag case:Then
_parse_openai_completionbecomes less branched:This keeps all behaviors (multimodal, text,
<think>tags, image formats, unknown parts) but moves the detailed branching out of the central parsing function.