Skip to content

Ground truth materials #49

@MikeSmithEU

Description

@MikeSmithEU

2 freely available possible datasets have already been identified, more are welcome:

  1. Mozilla Common Voice https://voice.mozilla.org/en
    CC-0 license
  2. Openslr resources http://openslr.org/resources.php
    Each resource has own license ranging from "unrestricted" to "CC-BY-NC-ND 3.0"
    Remark: Some of the Openslr data is likely to have been used for training various STT systems, as such it may not always be the most fair indicator

Open questions:

  • Which ground truth materials might we use for evaluating vendors' solutions? Will we build our own dataset? Or both?
  • How will we include these resources into our product?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementBacklog idea or requirement for future discussions/development.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions