Skip to content
ookimi edited this page Jul 1, 2017 · 3 revisions

Welcome to the vsmlib wiki! Current and planned functionality

Reading in pre-trained vectors

Supported formats

  • word2vec .bin format
  • GloVe plain-text vectors
  • SVD
  • sparse vectors

TBD Downloadable vectors:

  • araneum+wiki SVD vectors + ref to paper to cite for them
  • Russian SVD vectors
  • 4 types of syntactic vectors

Training your own vectors

TBD Pre-processing text for training vectors

  • put in the old scripts for Russian + ref to paper to cite
  • sentence boundaries options

TBD Training vectors

  • running classic word2vec from vsmlib (?)
  • svd vectors?
  • 4 types of syntactic vectors + ref to paper to cite

Doing stuff with vectors

  • get similarity between two words
  • outputting n top similar neighbors with similarity ranks (see if possible to do this part in C to speed it up)
  • 6 analogy methods + refs to papers
  • using BATS and google test sets + BATS ref
  • correlation with human judgements (built-in wordsim535 and simlex999, + refs to papers)
  • finding the odd word ("cucumber", "melon", "apple", "seagull" - should tell which word is the least similar to others)

Distributing VSMlib

  • TBD pip