Skip to content
Jinho D. Choi edited this page Dec 31, 2016 · 1 revision

Dependency Parsing

Your task is to implement a dependency parser reaching to the state-of-the-art accuracy and speed. You are allowed to work in groups of at most 2. Submit your work by Oct. 7th before the class.

  • Download the standard dataset.
  • Implement a transition-based dependency parser using Nivre's arc-eager algorithm using our architecture.
  • Improve your parser in various ways.
  • Evaluate the accuracy of your parser (UAS and LAS) for both Malt and Stanford dependency formats.
  • Evaluate the speed of your parser (tokens/sec. and sentences/sec.).
  • Write a report (4-8) pages in the ACL format. Your report must include abstract, introduction, related work, approach, experiments, and conclusion.
  • Commit all your work to your Github repoistory.
  • Create a wiki page Dependency Parsing showing instructions of how to run your parser. You must provide a pre-trained model that is ready to be run.

Data format

1	Ms.	ms.	NNP	_	2	NMOD	2	nn
2	Haag	haag	NNP	_	3	SUB	3	nsubj
3	plays	play	VBZ	_	0	ROOT	0	root
4	Elianti	elianti	NNP	_	3	OBJ	3	dobj
5	.	.	.	_	3	P	3	punct

Each column represents:

  • 0: ID.
  • 1: word-form.
  • 2: lemma (predicted).
  • 3: POS tag (predicted).
  • 4: extra features (blank).
  • 5: head ID from Malt (gold).
  • 6: dependency label from Malt (gold).
  • 7: head ID from Stanford (gold).
  • 8: dependency label from Stanford (gold).

Notes

  • Your dependency parser must extend NLPComponent.
  • No 3rd-party library including any implementation of a dependency parser can be used for this homework.
  • Your final model must not be tuned for the evaluation set.
  • Your report must clearly state all of your approaches and findings including:
  • Machine learning algorithm.
  • Parsing algorithm.
  • Search strategy.
  • Feature set.
  • In your wiki-page, indicate which codes are used for your dependency parser.
  • You must report the following for at least your baseline and final models.
  • UAS and LAS for the Malt dependency format on the development and evaluation sets.
  • UAS and LAS for the Stanford dependency format on the development and evaluation sets.
  • Use SpeedTest for measuring the speed of your parser.

CS571: Natural Language Processing

Instructor


Emory University

Clone this wiki locally