Hi!
I've noticed your project and goals you try to achieve with it, and I found it curious why there is such strict limit on languages (support only on en/ru). I see that you used spacy to tokenize text, and this idea seemed a bit off to me. I'd agree that spacy tokenization can be beneficial for some use-cases, it seems a bit out-of-place for a project that aims to simplify NER and language understanding pipelines as you are heavily limiting opportunity for others to use your solution. I am not sure whether Spacy is a crucial step in your pipeline, but I want to highlight that it only supports 24 languages out of the box.
Are there any plans to expand supported language set or remove such limitation entirely?
Hi!
I've noticed your project and goals you try to achieve with it, and I found it curious why there is such strict limit on languages (support only on en/ru). I see that you used spacy to tokenize text, and this idea seemed a bit off to me. I'd agree that spacy tokenization can be beneficial for some use-cases, it seems a bit out-of-place for a project that aims to simplify NER and language understanding pipelines as you are heavily limiting opportunity for others to use your solution. I am not sure whether Spacy is a crucial step in your pipeline, but I want to highlight that it only supports 24 languages out of the box.
Are there any plans to expand supported language set or remove such limitation entirely?