forked from erickrf/nlpnet
-
Notifications
You must be signed in to change notification settings - Fork 2
POS Tagging
attardi edited this page Dec 28, 2014
·
4 revisions
Training a POS tagger requires the following data: - an annotated corpus in tsv format, containing one token per line, consisting of two fields: form POS tag. Sentences are separated by an empty line - word embeddings created using nlpnet
For training the tagger, one can use the following command:
nlpnet-train.py pos [-h] [-w WINDOW] [-f NUM_FEATURES]
[--load_features] [--load_network] [-e ITERATIONS]
[-l LEARNING_RATE] [--lf LEARNING_RATE_FEATURES]
[--lt LEARNING_RATE_TRANSITIONS] [-a ACCURACY]
[-n HIDDEN] [-v] --gold GOLD --data DATA
[--variant VARIANT] [--caps [CAPS]]
[--suffix [SUFFIX]] [--suffix_size SUFFIX_SIZE]
[--prefix [PREFIX]] [--prefix_size PREFIX_SIZE]
optional arguments:
-h, --help show this help message and exit
-w WINDOW, --window WINDOW
Size of the word window (default 5)
-f NUM_FEATURES, --num_features NUM_FEATURES
Number of features per word (default 50)
--load_features Load previously saved word type features (overrides -f
and must also load a dictionary file)
--load_network Load previously saved network
-e ITERATIONS, --epochs ITERATIONS
Number of training epochs (default 100)
-l LEARNING_RATE, --learning_rate LEARNING_RATE
Learning rate for network weights (default 0.001)
--lf LEARNING_RATE_FEATURES
Learning rate for features (default 0.01)
--lt LEARNING_RATE_TRANSITIONS
Learning rate for transitions (default 0.01)
-a ACCURACY, --accuracy ACCURACY
Desired accuracy per tag.
-n HIDDEN, --hidden HIDDEN
Number of hidden neurons (default 200)
-v, --verbose Verbose mode
--gold GOLD File with annotated data for training.
--data DATA Directory to save new models and load partially
trained ones
--variant VARIANT If "polyglot" use Polyglot case conventions; if
"senna" use SENNA conventions.
--caps [CAPS] Include capitalization features. Optionally, supply
the number of features (default 5)
--suffix [SUFFIX] Include suffix features. Optionally, supply the number
of features (default 5)
--suffix_size SUFFIX_SIZE
Use suffixes up to this size (in characters, default
5). Only used if --suffix is supplied
--prefix [PREFIX] Include prefix features. Optionally, supply the number
of features (default 2)
--prefix_size PREFIX_SIZE
Use prefixes up to this size (in characters, default
5). Only used if --suffix is supplied