-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Problem related to the evaluation threshold of wake-up word detection: Why, when setting the threshold to 0.0, according to the code in howl/howl/model/inference.py:
if max_prob < self.threshold:
max_label = self.negative_labelall samples with prediction probability (probability of being predicted as a positive sample of the wake-up word?) < 0.0 should be classified as negative. Since the probability >= 0, theoretically all samples should be classified as positive, i.e., fn=tp=0. However, according to the hey_fire_fox experiment, when the threshold is 0.0, tn=2428, fn=2. The model still retains the ability to distinguish between negative samples. What causes this issue? Could it be related to OOV (Out-of-Vocabulary) classification? Or is it related to rounding errors?
line | eval_dataset | threshold | tp | tn | fp | fn
-- | -- | -- | -- | -- | --
1 | Dev positive | 0.0 | 74 | 0 | 0 | 2
2 | Dev negative | 0.0 | 0 | 2428 | 103 | 0
3 | Dev noisy positive | 0.0 | 69 | 0 | 0 | 7
4 | Dev noisy negative | 0.0 | 0 | 2468 | 63 | 0
5 | Test positive | 0.0 | 47 | 0 | 0 | 7
6 | Test negative | 0.0 | 0 | 2399 | 105 | 0
7 | Test noisy positive | 0.0 | 45 | 0 | 0 | 9
8 | Test noisy negative | 0.0 | 0 | 2442 | 62 | 0