Skip to content

evaluation threshold of wake-up word detection #129

@Ruiqin-Huang

Description

@Ruiqin-Huang

Problem related to the evaluation threshold of wake-up word detection: Why, when setting the threshold to 0.0, according to the code in howl/howl/model/inference.py:

if max_prob < self.threshold:
        max_label = self.negative_label

all samples with prediction probability (probability of being predicted as a positive sample of the wake-up word?) < 0.0 should be classified as negative. Since the probability >= 0, theoretically all samples should be classified as positive, i.e., fn=tp=0. However, according to the hey_fire_fox experiment, when the threshold is 0.0, tn=2428, fn=2. The model still retains the ability to distinguish between negative samples. What causes this issue? Could it be related to OOV (Out-of-Vocabulary) classification? Or is it related to rounding errors?

line | eval_dataset | threshold | tp | tn | fp | fn
-- | -- | -- | -- | -- | --
1 | Dev positive | 0.0 | 74 | 0 | 0 | 2
2 | Dev negative | 0.0 | 0 | 2428 | 103 | 0
3 | Dev noisy positive | 0.0 | 69 | 0 | 0 | 7
4 | Dev noisy negative | 0.0 | 0 | 2468 | 63 | 0
5 | Test positive | 0.0 | 47 | 0 | 0 | 7
6 | Test negative | 0.0 | 0 | 2399 | 105 | 0
7 | Test noisy positive | 0.0 | 45 | 0 | 0 | 9
8 | Test noisy negative | 0.0 | 0 | 2442 | 62 | 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions