fix: reset KV cache between generations to fix second message bug by konard · Pull Request #8 · link-assistant/model-in-browser

konard · 2026-01-03T00:00:58Z

Summary

This PR fixes the critical issue where sending a second message to the SmolLM2 model would fail with:

Worker error: Forward pass failed: cannot broadcast [10, 10] to [1, 9, 10, 275]

Root Cause Analysis

The KV (Key-Value) cache in the LLaMA model was persistent across multiple generate() calls. When a new message was sent:

First message: Cache was empty, generation worked fine
Second message: Cache still contained KV values from the first message, but the new prompt started from position 0, causing dimension mismatches in the attention mechanism

Changes

Bug Fixes:

Reset KV cache at the beginning of each generate() call to ensure clean state for new conversations
Add proper handling for ChatML special tokens (<|im_end|>, <|im_start|>, <|endoftext|>)
Support multiple EOS token types for SmolLM2-Instruct model format

Features:

Implement automatic model download on page load (no button click required)
Update initial welcome message to reflect automatic loading

Tests:

Add E2E test that sends 3 consecutive messages to verify the fix
Update existing tests for automatic model loading behavior

Technical Details

The fix modifies wasm/src/lib.rs to:

Change _config to config to make it accessible for cache recreation
Create a new Cache instance at the start of each generation
Add comprehensive EOS token handling for ChatML format
Filter out special tokens from the generated output

Test Plan

Verify WASM module compiles without errors
Verify Rust formatting passes (cargo fmt --check)
E2E test: First message generates successfully
E2E test: Second message generates successfully (the original bug)
E2E test: Third message generates successfully
Verify no cannot broadcast errors in console
Verify model loads automatically without button click

Fixes #7

🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #7

Root cause: The KV cache was persistent across multiple generate() calls, but each new message starts from position 0 with a new prompt, causing dimension mismatches when the cache contains values from the previous generation. Changes: - Reset KV cache at the beginning of each generate() call - Add proper handling for ChatML special tokens (<|im_end|>, <|im_start|>) - Support multiple EOS token types for SmolLM2-Instruct model - Implement automatic model download on page load (no button needed) - Add E2E test for multiple consecutive messages (issue #7) - Update initial welcome message to reflect automatic loading Fixes #7 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

konard · 2026-01-03T00:27:40Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $7.550226 USD
Calculated by Anthropic: $5.989189 USD
Difference: $-1.561038 (-20.68%)
📎 Log file uploaded as GitHub Gist (1040KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

Initial commit with task details

22e35bc

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #7

konard self-assigned this Jan 3, 2026

konard changed the title ~~[WIP] Second message sending fails~~ fix: reset KV cache between generations to fix second message bug Jan 3, 2026

konard marked this pull request as ready for review January 3, 2026 00:19

chore: remove CLAUDE.md task file

7340810

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

konard merged commit d69597f into main Jan 3, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reset KV cache between generations to fix second message bug#8

fix: reset KV cache between generations to fix second message bug#8
konard merged 3 commits intomainfrom
issue-7-8b4924ede20c

konard commented Jan 3, 2026 •

edited

Loading

Uh oh!

konard commented Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konard commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause Analysis

Changes

Technical Details

Test Plan

Uh oh!

konard commented Jan 3, 2026

🤖 Solution Draft Log

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konard commented Jan 3, 2026 •

edited

Loading