Skip to content

fix: reset KV cache between generations to fix second message bug#8

Merged
konard merged 3 commits intomainfrom
issue-7-8b4924ede20c
Jan 3, 2026
Merged

fix: reset KV cache between generations to fix second message bug#8
konard merged 3 commits intomainfrom
issue-7-8b4924ede20c

Conversation

@konard
Copy link
Contributor

@konard konard commented Jan 3, 2026

Summary

This PR fixes the critical issue where sending a second message to the SmolLM2 model would fail with:

Worker error: Forward pass failed: cannot broadcast [10, 10] to [1, 9, 10, 275]

Root Cause Analysis

The KV (Key-Value) cache in the LLaMA model was persistent across multiple generate() calls. When a new message was sent:

  1. First message: Cache was empty, generation worked fine
  2. Second message: Cache still contained KV values from the first message, but the new prompt started from position 0, causing dimension mismatches in the attention mechanism

Changes

Bug Fixes:

  • Reset KV cache at the beginning of each generate() call to ensure clean state for new conversations
  • Add proper handling for ChatML special tokens (<|im_end|>, <|im_start|>, <|endoftext|>)
  • Support multiple EOS token types for SmolLM2-Instruct model format

Features:

  • Implement automatic model download on page load (no button click required)
  • Update initial welcome message to reflect automatic loading

Tests:

  • Add E2E test that sends 3 consecutive messages to verify the fix
  • Update existing tests for automatic model loading behavior

Technical Details

The fix modifies wasm/src/lib.rs to:

  1. Change _config to config to make it accessible for cache recreation
  2. Create a new Cache instance at the start of each generation
  3. Add comprehensive EOS token handling for ChatML format
  4. Filter out special tokens from the generated output

Test Plan

  • Verify WASM module compiles without errors
  • Verify Rust formatting passes (cargo fmt --check)
  • E2E test: First message generates successfully
  • E2E test: Second message generates successfully (the original bug)
  • E2E test: Third message generates successfully
  • Verify no cannot broadcast errors in console
  • Verify model loads automatically without button click

Fixes #7


🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #7
@konard konard self-assigned this Jan 3, 2026
Root cause: The KV cache was persistent across multiple generate() calls,
but each new message starts from position 0 with a new prompt, causing
dimension mismatches when the cache contains values from the previous
generation.

Changes:
- Reset KV cache at the beginning of each generate() call
- Add proper handling for ChatML special tokens (<|im_end|>, <|im_start|>)
- Support multiple EOS token types for SmolLM2-Instruct model
- Implement automatic model download on page load (no button needed)
- Add E2E test for multiple consecutive messages (issue #7)
- Update initial welcome message to reflect automatic loading

Fixes #7

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard konard changed the title [WIP] Second message sending fails fix: reset KV cache between generations to fix second message bug Jan 3, 2026
@konard konard marked this pull request as ready for review January 3, 2026 00:19
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard
Copy link
Contributor Author

konard commented Jan 3, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $7.550226 USD
  • Calculated by Anthropic: $5.989189 USD
  • Difference: $-1.561038 (-20.68%)
    📎 Log file uploaded as GitHub Gist (1040KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard merged commit d69597f into main Jan 3, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Second message sending fails

1 participant