Hi team,
I evaluated it across multiple datasets and observed that spoken punctuation (e.g., "next line", "next paragraph", "comma") is often generated inconsistently or incompletely in the ASR output.
Observed Behavior
The model frequently produces malformed or partial punctuation tokens, such as:
{next} line}
{next line
next line}
xt line}
{next} para}graph}
These outputs indicate that the punctuation tokens are not being generated or closed properly, leading to broken or unusable text.
Expected Behavior
Spoken punctuation tokens should be generated consistently and completely, for example:
{next line}
{next paragraph}
What I Tried
1. Evaluated across multiple datasets → issue persists
2. Tested with different tokenizers, transformers → no improvement
3. Current tokenizer version: tokenizers=0.22.2; transformers=5.3.0
Additional Context / Hypothesis
This may require:
- Investigation into how the model handles spoken punctuation tokens internally, or
- A post-processing layer to normalize/fix malformed tokens
- Access to the language model text or ARPA file could help debug and potentially mitigate this issue (I have also raised a separate request for this).
Request
- Could you please confirm if this is a known issue?
- Is it possible to share the LM text / ARPA to help debug and improve handling of these cases?
Thanks!
Hi team,
I evaluated it across multiple datasets and observed that spoken punctuation (e.g., "next line", "next paragraph", "comma") is often generated inconsistently or incompletely in the ASR output.
Observed Behavior
The model frequently produces malformed or partial punctuation tokens, such as:
These outputs indicate that the punctuation tokens are not being generated or closed properly, leading to broken or unusable text.
Expected Behavior
Spoken punctuation tokens should be generated consistently and completely, for example:
What I Tried
Additional Context / Hypothesis
This may require:
Request
Thanks!