list of improvements

- [ ] allow to pass a list of integers instead of tokens to the word2vec function
- [ ] see how to remove the embedding of `</s> `
- [ ] abandon file-based approach
- [ ] speed up for Xptr's like quanteda objects to avoid copying data?
- [ ] other speed improvements
- [ ] progress bar
- [ ] functionalities for downstream processing
      - plotting or functionalities in https://github.com/bnosac/textplot
      - downstream topic modelling like https://github.com/bnosac/ETM or as a replacement of SVD's for semi-supervised stuff
      - embeddings on sentencepiece/tokenisers.bpe tokenised data
      - pretrained models
      - further input to torch models
      - deeper integration of the similarities like https://github.com/bnosac/doc2vec or https://koheiw.github.io/LSX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

list of improvements #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

list of improvements #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions