This repo contains some exercices about creating word2vec models using the library gensim.
File containing some sample analogies to be tested by a trained model.
Makes analogies of the form man woman king (expected output: queen) using a trained model - so it is a KeyedVectors model. The model should be supplied as a parameter. An optional parameter is a file containing analogies, and then every analogy is show, the program halts and waits any user input. After the file is processed the program defaults to typed analogies - or goes straight to this part if no file is suplied.
python analogies.py [keyed-vectors-model] [analogies-text-file]
Sample containing the 5 Game of Thrones books. Useful to train models and run experiments.
Creates a model based on a simple text file, which is supplied as a parameter. The output is a keyed vectors model (so non-trainable) in the form of file name and the extension .wv.
python model.py [sample-textfile]
Creates a model based on a Wikipedia Corpus of articles - the file has to be supplied as a parameter. It saves the resulting vectors in a file called wiki.wv. The dumps can be download from here in the form of [LANG]wiki-[DATE]-pages-articles-multistream.xml.bz2. It takes a long time.
python wiki.py [wikipedia-dump-file.xml.bz2]