phonetized_ner_srv

Tiny Flask app for phonetization, NE tagging and text distance calculation.

Prerequisites

Python 3 and PyPI packages flask, mordl, textdistance, toxine, transliterate.

Starting the Server

First, place storages of trained MorDL UposTagger, FeatsTagger and NeTagger into srv/models directory. Change the parameter emb_path in ds_config.json file of every storage, so that that path became correct. Note, that the root point for relative paths there is ner_srv. Thus, if your embeddings also placed in the srv/models directory, just add 'model/' in the beginning of each emb_path value.

Second, you may go back to the srv directory and correct port in main.py script.

After that, ensure that you're still in the srv directory and run

sh ./run.sh prod

Or, if you need debug mode, run just

sh ./run.sh

Usage

All services return data in json format.

http://<address>:<port>/api/tokenize/<text>

Returns Parsed CoNLL-U for tokenized text (untagged).

http://<address>:<port>/api/tag/<text>

Returns Parsed CoNLL-U with text tokenized and with UPOS, FEATS and MISC:NE fields filled.

http://<address>:<port>/api/phonetize/<text>?level=3&syllables=false

Returns phonetized version of text. Only texts in Russian are processed correctly.

level: the level of simplification. Allowed values:

0 means no changes at all but excess spaces;
1 removes all spaces;
2 most standard version of phonetization;
3 refined phonetization;
4 rude phonetization;
5 even more rude.

Default level is 3.

syllables: if true, returns array of syllables instead of just text phonetized. Default is false.

http://<address>:<port>/api/text-distance/<text1>/<text2>?ner1=&ner2=&level=3&algorithm=damerau_levenshtein&normalize=true&qval=1

Returns text distance between text1 and text2. Only text in Russian are processed correctly.

ner1: if specified, at the start, text1 will be tokenized and tagged, and then replaced by FORM fields of tokens that have ner1 as value of the MISC:NE field.

ner2: if specified, at the start, text2 will be tokenized and tagged, and then replaced by FORM fields of tokens that have ner2 as value of the MISC:NE field.

level: before calculating the distance, both text1 and text2 will be phonetized with that level (see api/phonetize service).

algorithm: what method to use to calculate the distance. Allowed values are: hamming, levenshtein, damerau_levenshtein (default), jaro, jaro_winkler, gotoh, smith_waterman.

normalize: use normalized distance (default is true).

qval: use 1 (default).

License

phonetized_ner_srv is released under the Apache License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
srv		srv
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

phonetized_ner_srv

Prerequisites

Starting the Server

Usage

License

About

Uh oh!

Releases

Packages

Languages

License

fostroll/phonetized_ner_srv

Folders and files

Latest commit

History

Repository files navigation

phonetized_ner_srv

Prerequisites

Starting the Server

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages