Skip to content

Conversation

@nikvaessen
Copy link
Collaborator

@nikvaessen nikvaessen commented Feb 2, 2025

There is undefined behaviour when you apply an empty reference and hypothesis pair
to the WER formula, as you get a division by zero.

As of version 4.0, jiwer defines the behaviour as follows:

import jiwer

# when ref and hyp are both empty, there is no error as
# an ASR system correctly predicted silence/non-speech.
assert jiwer.wer('', '') == 0 
assert jiwer.mer('', '') == 0
assert jiwer.wip('', '') == 1
assert jiwer.wil('', '') == 0

assert jiwer.cer('', '') == 0

When the hypothesis is non-empty, every word or character counts as an insertion:

import jiwer

assert jiwer.wer('', 'silence') == 1
assert jiwer.wer('', 'peaceful silence') == 2
assert jiwer.process_words('', 'now defined behaviour').insertions == 3

assert jiwer.cer('', 'a') == 1
assert jiwer.cer('', 'abcde') == 5

This resolves #98.

@nikvaessen nikvaessen merged commit 9aada1e into jitsi:master Feb 2, 2025
7 checks passed
@nikvaessen nikvaessen deleted the empty_ref branch February 2, 2025 15:54
@KarelVesely84
Copy link

Hello Nik,
what would you think about adding a "NoTransform" placeholder class ?

This would be used, if user wishes to do all the transforms himself, example here:
KarelVesely84@1c5f29d#diff-aae25c8b196777e4abeb01845ca517591f65580977ee7d1f639b5bbf16e26d9d
(plus a registration is necessary in transforms.py header to __all__ variable)

Or can there be something similar already in the code ?

Thank you & best regards
Karel

@nikvaessen
Copy link
Collaborator Author

See #116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

utterance ref cannot be empty ?

2 participants