This repository contains code for evaluating the alignment between lexical semantics and vector semantics using the SimLex-999 dataset and the Brown corpus from NLTK. The evaluation is conducted using various vector semantic methods, including TF-iDF and Word2Vec, and the results are compared against manually determined lexical semantics.
Lexical semantics and vector semantics are two approaches to understanding the meaning of words in natural language processing. This project aims to evaluate how well vector semantic methods capture the semantics of words compared to manually determined lexical semantics.
- SimLex-999: Golden standard dataset for lexical semantics evaluation.
- Brown corpus: Large corpus from NLTK used for training vector semantic methods.
To run the evaluation code:
- Just download the SimLex-999 file and give path in the ipynb file.
- Run the provided Python ipynb file, specifying the desired settings for vector semantic methods for both TF-IDF and Word2Vec.