-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
- allow to pass a list of integers instead of tokens to the word2vec function
- see how to remove the embedding of
</s> - abandon file-based approach
- speed up for Xptr's like quanteda objects to avoid copying data?
- other speed improvements
- progress bar
- functionalities for downstream processing
- plotting or functionalities in https://github.com/bnosac/textplot
- downstream topic modelling like https://github.com/bnosac/ETM or as a replacement of SVD's for semi-supervised stuff
- embeddings on sentencepiece/tokenisers.bpe tokenised data
- pretrained models
- further input to torch models
- deeper integration of the similarities like https://github.com/bnosac/doc2vec or https://koheiw.github.io/LSX
Metadata
Metadata
Assignees
Labels
No labels