Skip to content

feat(channels): voice transcription quality — annotation + optional LLM correction (#1215)#1217

Open
crrow wants to merge 1 commit intomainfrom
issue-1215-voice-quality
Open

feat(channels): voice transcription quality — annotation + optional LLM correction (#1215)#1217
crrow wants to merge 1 commit intomainfrom
issue-1215-voice-quality

Conversation

@crrow
Copy link
Copy Markdown
Collaborator

@crrow crrow commented Apr 9, 2026

Summary

Adds two layers of post-processing for STT (voice → text) output in the Telegram and Web channel adapters.

  • Layer A (always on): prepends [Voice transcription — may contain errors, interpret by context] so the downstream LLM interprets voice input with appropriate error tolerance.
  • Layer B (opt-in via stt.correction.enabled: true): runs a fast LLM pass (e.g. glm-4-flash) to fix obvious speech-recognition mistakes before delivery. Correction failure is non-fatal — falls back to the raw transcription.

The driver registry is read from KernelHandle::driver_registry() at message-handling time, avoiding extra plumbing through the polling loops.

Configuration

stt:
  base_url: "http://localhost:8080"
  correction:
    enabled: true
    model: "glm-4-flash"
    provider: "glm"

Type of change

Type Label
New feature enhancement

Component

backend

Closes

Closes #1215

Test plan

  • cargo check --all --all-targets passes
  • cargo test -p rara-channels -p rara-stt passes (93 + 6 + 1 + 2 tests)
  • cargo +nightly fmt --all -- --check passes
  • cargo clippy --workspace --all-targets --all-features --no-deps -- -D warnings passes
  • RUSTDOCFLAGS="-D warnings" cargo +nightly doc --workspace --no-deps --document-private-items passes

…LM correction (#1215)

Add two layers of post-processing for STT output in voice channels:

- Layer A (always on): prepend a hint so the downstream LLM treats voice
  input as speech-recognised text that may contain errors.
- Layer B (opt-in via stt.correction.enabled): run a fast LLM pass to fix
  obvious transcription mistakes before delivery. Failure is non-fatal —
  the adapter falls back to the raw transcription.

Wires SttCorrectionConfig + the kernel driver registry into the Telegram
and Web channel adapters. The driver registry is read from KernelHandle
at message-handling time to avoid extra plumbing through polling loops.

Closes #1215
@crrow crrow added enhancement New feature or request backend Backend/API changes labels Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Backend/API changes enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(channels): voice transcription quality — annotation + optional LLM correction pass

1 participant