NLP project on dependency parsers
FINAL REPORT Template: https://www.overleaf.com/12707057qsjgmmyjnswh
Requirements for the report: https://github.com/tdeoskar/NLP1-2017/blob/master/project-reqs.md
Milestone:
Dependency-data
Reading in and writing out text from the CONLL-U file type. Done
Replace all the words in your training file that occur just once with the word . Done
w2i, t2i, and l2i dicts. And inverse: i2w, i2t, i2l Done
Remove lines from .conllu file that have non integers as indices Done
MST
In progress
LSTM
Embedding layer for words Done
Support for optional pretrained word embeddings Done
Embedding layer for POS tags Done
Support for optional pretrained tag embeddings Done
Concatenate these word embeddings Done
LSTM layer Done
TO DO:
-
Implementation of MLP2 in pytorch (label classification)
-
Change cross entropy functions
-
Translation from MST to CoNLL file
-
Gold thingy for the labels
-
Handle multiple cycles in the MST algo