The Georgetown University Multilayer (GUM) corpus is a collection of texts gathered and refined through classroom annotation using collaborative software. Collected as an integral component of the linguistics curriculum at Georgetown University, the GUM corpus is specifically oriented toward contemporary English data, it is then tailored to meet the requirements of the corpus and computational linguistics research (Zeldes, 2017).
The objective of this project is to accurately classify the entity of each word while comparing the performance metrics of various models, including a feedforward neural network, LSTM, bidirectional LSTM, and BERT.