Skip to content

liviaalmeida/word2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

word2vec

This repo contains some exercices about creating word2vec models using the library gensim.

Files

analogies

File containing some sample analogies to be tested by a trained model.

analogies.py

Makes analogies of the form man woman king (expected output: queen) using a trained model - so it is a KeyedVectors model. The model should be supplied as a parameter. An optional parameter is a file containing analogies, and then every analogy is show, the program halts and waits any user input. After the file is processed the program defaults to typed analogies - or goes straight to this part if no file is suplied.

python analogies.py [keyed-vectors-model] [analogies-text-file]

gameofthrones.txt

Sample containing the 5 Game of Thrones books. Useful to train models and run experiments.

model.py

Creates a model based on a simple text file, which is supplied as a parameter. The output is a keyed vectors model (so non-trainable) in the form of file name and the extension .wv.

python model.py [sample-textfile]

wiki.py

Creates a model based on a Wikipedia Corpus of articles - the file has to be supplied as a parameter. It saves the resulting vectors in a file called wiki.wv. The dumps can be download from here in the form of [LANG]wiki-[DATE]-pages-articles-multistream.xml.bz2. It takes a long time.

python wiki.py [wikipedia-dump-file.xml.bz2]

Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages