Access to LM ARPA Files and Training Text Data

Could we also get access to the LM ARPA files and the underlying text data?
Currently, the shared language model is only available in binary format, which limits visibility into its structure.

Having access to the ARPA/text data would help us:
- Understanding the tokenizer structure
- Analyzing spoken punctuation handling
- Extending or adapting the LM
- Performing LM interpolation with other datasets

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to LM ARPA Files and Training Text Data #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Access to LM ARPA Files and Training Text Data #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions