-
Notifications
You must be signed in to change notification settings - Fork 50
Improve <think>/<thinking> tag identification in streaming responses #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces the previous batch-processing approach to detecting <think> and <thinking> tags with a streaming state machine that handles tags split across multiple chunks. This is essential for models like Amazon Nova that emit these tags in fragments rather than complete units.
Key changes:
- Implements a 4-state machine ('normal', 'buffering_open', 'thinking', 'buffering_close') to detect and strip thinking tags from streaming content
- Removes the old
parseThinkingContentfunction that required complete strings - Adds state tracking fields (
thinkingState,tagBuffer) to AgentContext for maintaining parser state across chunks
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/stream.ts | Removed old batch parseThinkingContent function and replaced conditional logic with streaming state machine that processes content character-by-character while buffering potential tag sequences |
| src/specs/fragmented-thinking.test.ts | Added new test file to verify the state machine correctly handles thinking tags split by whitespace boundaries using the fake streaming model |
| src/agents/AgentContext.ts | Added thinkingState and tagBuffer fields to maintain state machine context across streaming chunks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add state machine to detect <thinking> and <think> tags that arrive split across multiple streaming chunks. This fixes Amazon Nova and similar models that send tokens like '<thinking', '>', content, '</', 'thinking', '>' as separate stream events. The state machine buffers content starting with '<' and waits for enough characters to determine if it's a thinking tag before routing to TEXT or THINK content types.
Verifies that <thinking> tags are correctly detected and stripped from streamed content, with thinking content routed to THINK type and regular text routed to TEXT type.
The state machine now strips <think>/<thinking> tags from content, so update the test to expect capture group 1 (content inside tags) instead of capture group 0 (entire match including tags).
Ensures partial tags aren't lost when stream ends mid-buffer
a9abb21 to
0a493ef
Compare
|
@danny-avila copilot comments resolved |
Some models like Amazon Nova send their thoughts wrapped in .. or .. tags. It is not guaranteed that these strings arrive as single streaming events, in fact more often than not they don't and we may receive '<', 'thinking', '>' tokens separately.
This patch introduces a state machine that deals with such a fragmented stream, identifies the tags and properly emits the thoughts content as
ContentTypes.THINKfor a correct frontend rendering. Fixes the output from at least Amazon Nova and quite likely from other models that emit these tags too.