Skip to content

feat(kernel): add SentenceBuffer for streaming text chunking (#1205)#1210

Open
crrow wants to merge 1 commit intomainfrom
issue-1205-sentence-buffer
Open

feat(kernel): add SentenceBuffer for streaming text chunking (#1205)#1210
crrow wants to merge 1 commit intomainfrom
issue-1205-sentence-buffer

Conversation

@crrow
Copy link
Copy Markdown
Collaborator

@crrow crrow commented Apr 9, 2026

Summary

Add SentenceBuffer — a pure text segmentation utility that accumulates streaming TextDelta chunks and emits complete sentences on sentence-ending punctuation (。!?.!?\n).

No async, no I/O, no TTS dependency. Designed to sit between an LLM streaming output and a TTS synthesizer for sentence-by-sentence voice reply (#1206).

  • Handles Chinese/English mixed text
  • Consecutive delimiters collapsed (?! → one sentence)
  • flush() drains unterminated text at turn end
  • 9 unit tests

Type of change

Type Label
New feature enhancement

Component

core

Closes

Closes #1205

Test plan

  • cargo test -p rara-kernel --lib sentence_buffer — 9/9 pass
  • Pre-commit hooks (check, fmt, clippy, doc) all pass

Pure text segmentation utility that accumulates TextDelta chunks and
emits complete sentences split on sentence-ending punctuation
(。!?.!?\n). No async, no I/O — designed to sit between an LLM
streaming output and a TTS synthesizer.

Handles Chinese/English mixed text, consecutive delimiters, incremental
deltas, and trailing unterminated text via flush().

9 unit tests covering all edge cases.

Closes #1205
@crrow crrow added enhancement New feature or request core Core system changes labels Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core system changes enhancement New feature or request

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

feat(kernel): SentenceBuffer for streaming text chunking

1 participant