-
Notifications
You must be signed in to change notification settings - Fork 50
Improve <think>/<thinking> tag identification in streaming responses #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mludvig
wants to merge
6
commits into
danny-avila:main
Choose a base branch
from
mludvig:feat/nova-thinking
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
1936b45
feat: Handle fragmented thinking tags in streamed content
mludvig e52f3c5
test: Add test for fragmented thinking tag parsing
mludvig 5c5757f
test: Update reasoning test to expect stripped tags
mludvig 921feba
fix: Flush thinking buffer at stream end in ModelEndHandler
mludvig d1f0d4d
test: Add more test cases for thinking tag handling
mludvig 0a493ef
doc: Describe state machine states
mludvig File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,277 @@ | ||
| // src/specs/fragmented-thinking.test.ts | ||
| // Tests for fragmented <thinking> tag handling in streamed content | ||
| // This tests the state machine that detects thinking tags split across chunks | ||
|
|
||
| import { HumanMessage, MessageContentText } from '@langchain/core/messages'; | ||
| import type { RunnableConfig } from '@langchain/core/runnables'; | ||
| import type * as t from '@/types'; | ||
| import { ChatModelStreamHandler, createContentAggregator } from '@/stream'; | ||
| import { GraphEvents, Providers } from '@/common'; | ||
| import { getLLMConfig } from '@/utils/llmConfig'; | ||
| import { Run } from '@/run'; | ||
|
|
||
| describe('Fragmented Thinking Tags Tests', () => { | ||
| jest.setTimeout(30000); | ||
| let run: Run<t.IState>; | ||
| let contentParts: t.MessageContentComplex[]; | ||
| let aggregateContent: t.ContentAggregator; | ||
| let runSteps: Set<string>; | ||
|
|
||
| const config: Partial<RunnableConfig> & { | ||
| version: 'v1' | 'v2'; | ||
| run_id?: string; | ||
| streamMode: string; | ||
| } = { | ||
| configurable: { | ||
| thread_id: 'fragmented-thinking-test', | ||
| }, | ||
| streamMode: 'values', | ||
| version: 'v2' as const, | ||
| callbacks: [ | ||
| { | ||
| async handleCustomEvent(event, data): Promise<void> { | ||
| if (event !== GraphEvents.ON_MESSAGE_DELTA) { | ||
| return; | ||
| } | ||
| const messageDeltaData = data as t.MessageDeltaEvent; | ||
|
|
||
| // Wait until we see the run step | ||
| const maxAttempts = 50; | ||
| let attempts = 0; | ||
| while (!runSteps.has(messageDeltaData.id) && attempts < maxAttempts) { | ||
| await new Promise((resolve) => setTimeout(resolve, 100)); | ||
| attempts++; | ||
| } | ||
|
|
||
| aggregateContent({ event, data: messageDeltaData }); | ||
| }, | ||
| }, | ||
| ], | ||
| }; | ||
|
|
||
| beforeEach(async () => { | ||
| const { contentParts: parts, aggregateContent: ac } = | ||
| createContentAggregator(); | ||
| aggregateContent = ac; | ||
| runSteps = new Set(); | ||
| contentParts = parts as t.MessageContentComplex[]; | ||
| }); | ||
|
|
||
| afterEach(() => { | ||
| runSteps.clear(); | ||
| }); | ||
|
|
||
| const onReasoningDeltaSpy = jest.fn(); | ||
|
|
||
| afterAll(() => { | ||
| onReasoningDeltaSpy.mockReset(); | ||
| }); | ||
|
|
||
| const setupCustomHandlers = (): Record< | ||
| string | GraphEvents, | ||
| t.EventHandler | ||
| > => ({ | ||
| [GraphEvents.CHAT_MODEL_STREAM]: new ChatModelStreamHandler(), | ||
| [GraphEvents.ON_RUN_STEP_COMPLETED]: { | ||
| handle: ( | ||
| event: GraphEvents.ON_RUN_STEP_COMPLETED, | ||
| data: t.StreamEventData | ||
| ): void => { | ||
| aggregateContent({ | ||
| event, | ||
| data: data as unknown as { result: t.ToolEndEvent }, | ||
| }); | ||
| }, | ||
| }, | ||
| [GraphEvents.ON_RUN_STEP]: { | ||
| handle: ( | ||
| event: GraphEvents.ON_RUN_STEP, | ||
| data: t.StreamEventData | ||
| ): void => { | ||
| const runStepData = data as t.RunStep; | ||
| runSteps.add(runStepData.id); | ||
| aggregateContent({ event, data: runStepData }); | ||
| }, | ||
| }, | ||
| [GraphEvents.ON_RUN_STEP_DELTA]: { | ||
| handle: ( | ||
| event: GraphEvents.ON_RUN_STEP_DELTA, | ||
| data: t.StreamEventData | ||
| ): void => { | ||
| aggregateContent({ event, data: data as t.RunStepDeltaEvent }); | ||
| }, | ||
| }, | ||
| [GraphEvents.ON_REASONING_DELTA]: { | ||
| handle: ( | ||
| event: GraphEvents.ON_REASONING_DELTA, | ||
| data: t.StreamEventData | ||
| ): void => { | ||
| onReasoningDeltaSpy(event, data); | ||
| aggregateContent({ event, data: data as t.ReasoningDeltaEvent }); | ||
| }, | ||
| }, | ||
| }); | ||
|
|
||
| // Helper to create a fresh run for each test | ||
| const createTestRun = async ( | ||
| customHandlers: Record<string | GraphEvents, t.EventHandler> | ||
| ): Promise<Run<t.IState>> => { | ||
| const llmConfig = getLLMConfig(Providers.BEDROCK); | ||
| return Run.create<t.IState>({ | ||
| runId: `fragmented-thinking-test-run-${Date.now()}`, | ||
| graphConfig: { | ||
| type: 'standard', | ||
| llmConfig, | ||
| instructions: 'You are a helpful assistant.', | ||
| }, | ||
| returnContent: true, | ||
| customHandlers, | ||
| }); | ||
| }; | ||
|
|
||
| // Test with <thinking> tags | ||
| test('should handle <thinking> tags in streamed content', async () => { | ||
| const customHandlers = setupCustomHandlers(); | ||
| run = await createTestRun(customHandlers); | ||
|
|
||
| const responseWithThinkingTag = | ||
| '<thinking> Let me think about this. </thinking> The answer is 42.'; | ||
| run.Graph?.overrideTestModel([responseWithThinkingTag], 2); | ||
|
|
||
| const inputs = { | ||
| messages: [new HumanMessage('What is the meaning of life?')], | ||
| }; | ||
|
|
||
| await run.processStream(inputs, config); | ||
|
|
||
| expect(contentParts).toBeDefined(); | ||
| expect(contentParts.length).toBe(2); | ||
|
|
||
| const thinkingPart = contentParts.find( | ||
| (p) => (p as t.ReasoningContentText).think !== undefined | ||
| ) as t.ReasoningContentText; | ||
| const textPart = contentParts.find( | ||
| (p) => (p as MessageContentText).text !== undefined | ||
| ) as MessageContentText; | ||
|
|
||
| expect(thinkingPart).toBeDefined(); | ||
| expect(thinkingPart.think).toContain('Let me think about this.'); | ||
| expect(thinkingPart.think).not.toContain('<thinking>'); | ||
| expect(thinkingPart.think).not.toContain('</thinking>'); | ||
|
|
||
| expect(textPart).toBeDefined(); | ||
| expect(textPart.text).toContain('The answer is 42.'); | ||
| expect(textPart.text).not.toContain('<thinking>'); | ||
|
|
||
| expect(onReasoningDeltaSpy).toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| // Test with <think> tags (shorter variant) | ||
| test('should handle <think> tags in streamed content', async () => { | ||
| onReasoningDeltaSpy.mockClear(); | ||
| const customHandlers = setupCustomHandlers(); | ||
| run = await createTestRun(customHandlers); | ||
|
|
||
| const responseWithThinkTag = | ||
| '<think> Processing the question... </think> Here is my response.'; | ||
| run.Graph?.overrideTestModel([responseWithThinkTag], 2); | ||
|
|
||
| const inputs = { | ||
| messages: [new HumanMessage('Tell me something.')], | ||
| }; | ||
|
|
||
| await run.processStream(inputs, config); | ||
|
|
||
| expect(contentParts).toBeDefined(); | ||
| expect(contentParts.length).toBe(2); | ||
|
|
||
| const thinkingPart = contentParts.find( | ||
| (p) => (p as t.ReasoningContentText).think !== undefined | ||
| ) as t.ReasoningContentText; | ||
| const textPart = contentParts.find( | ||
| (p) => (p as MessageContentText).text !== undefined | ||
| ) as MessageContentText; | ||
|
|
||
| expect(thinkingPart).toBeDefined(); | ||
| expect(thinkingPart.think).toContain('Processing the question...'); | ||
| expect(thinkingPart.think).not.toContain('<think>'); | ||
| expect(thinkingPart.think).not.toContain('</think>'); | ||
|
|
||
| expect(textPart).toBeDefined(); | ||
| expect(textPart.text).toContain('Here is my response.'); | ||
| expect(textPart.text).not.toContain('<think>'); | ||
|
|
||
| expect(onReasoningDeltaSpy).toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| // Test with plain text (no thinking tags) | ||
| test('should handle plain text without thinking tags', async () => { | ||
| onReasoningDeltaSpy.mockClear(); | ||
| const customHandlers = setupCustomHandlers(); | ||
| run = await createTestRun(customHandlers); | ||
|
|
||
| const responseWithoutTags = | ||
| 'This is a simple response without any thinking.'; | ||
| run.Graph?.overrideTestModel([responseWithoutTags], 2); | ||
|
|
||
| const inputs = { | ||
| messages: [new HumanMessage('Say something simple.')], | ||
| }; | ||
|
|
||
| await run.processStream(inputs, config); | ||
|
|
||
| expect(contentParts).toBeDefined(); | ||
| expect(contentParts.length).toBe(1); | ||
|
|
||
| const textPart = contentParts[0] as MessageContentText; | ||
| expect(textPart.text).toBe( | ||
| 'This is a simple response without any thinking.' | ||
| ); | ||
|
|
||
| // No reasoning delta should be called for plain text | ||
| expect(onReasoningDeltaSpy).not.toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| // Test with multiple thinking blocks in sequence | ||
| test('should handle multiple thinking blocks in sequence', async () => { | ||
| onReasoningDeltaSpy.mockClear(); | ||
| const customHandlers = setupCustomHandlers(); | ||
| run = await createTestRun(customHandlers); | ||
|
|
||
| const responseWithMultipleThinkingTags = | ||
| '<thinking> First thought. </thinking> Response one. <thinking> Second thought. </thinking> Response two.'; | ||
| run.Graph?.overrideTestModel([responseWithMultipleThinkingTags], 2); | ||
|
|
||
| const inputs = { | ||
| messages: [new HumanMessage('Give me a complex response.')], | ||
| }; | ||
|
|
||
| await run.processStream(inputs, config); | ||
|
|
||
| expect(contentParts).toBeDefined(); | ||
| // Should have thinking and text parts (exact count depends on aggregation) | ||
| expect(contentParts.length).toBeGreaterThanOrEqual(2); | ||
|
|
||
| const thinkingPart = contentParts.find( | ||
| (p) => (p as t.ReasoningContentText).think !== undefined | ||
| ) as t.ReasoningContentText; | ||
| const textPart = contentParts.find( | ||
| (p) => (p as MessageContentText).text !== undefined | ||
| ) as MessageContentText; | ||
|
|
||
| // Verify thinking content contains both thoughts (accumulated) | ||
| expect(thinkingPart).toBeDefined(); | ||
| expect(thinkingPart.think).toContain('First thought.'); | ||
| expect(thinkingPart.think).toContain('Second thought.'); | ||
| expect(thinkingPart.think).not.toContain('<thinking>'); | ||
| expect(thinkingPart.think).not.toContain('</thinking>'); | ||
|
|
||
| // Verify text content contains both responses | ||
| expect(textPart).toBeDefined(); | ||
| expect(textPart.text).toContain('Response one.'); | ||
| expect(textPart.text).toContain('Response two.'); | ||
| expect(textPart.text).not.toContain('<thinking>'); | ||
|
|
||
| expect(onReasoningDeltaSpy).toHaveBeenCalled(); | ||
| }); | ||
| }); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.