Skip to content

Conversation

@hyunjoolee
Copy link

I have found that if there are punctuation and dash characters in the text, they are not converted to clean text in text/init.py get_arpabet().

For examples, words like "recommendations.", "fbi," and "policy-making" are not searchable in the cmu_dict.
I think these will reduce model performance.

So I suggest some code as attached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant