word splitting regex fails with underdot characters

Hi,

I'm using hypher.js with transliterated Sanskrit, and it doesn't play well with characters such as ṇ, ṣ, ḍ, ṭ, etc. The problem seems to be the long regex used to split a string into words (line 107 of [hypher.js](https://github.com/bramstein/hypher/blob/master/lib/hypher.js)). I guess your character class doesn't include the unicode ranges for underdot characters. I've replaced it with a simpler expression: 
`var words = str.split(/([\s\n\r\t.,:;'"!?-])/g);`
which matches word boundary characters instead of word characters. It works for me but it's not totally comprehensive... you would have to add a few more boundary characters to it to make it work for more languages...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

word splitting regex fails with underdot characters #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

word splitting regex fails with underdot characters #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions