Create a token/embedding creation preprocessing pipeline using tf-transform

Issue: 
We currently depend on vocabularies, like glove embeddings, that are: 
1. Weirdly biased (although when you backprop to the embeddings, their initial bias is not very relevant anymore), 
2. Depend on being consistent with the tokenizer we use. 
3. Don't necessarily have the same words as our actual text. 

Proposed solution project: 
Use https://github.com/tensorflow/transform to develop text preprocessing pipelines, e.g. to select tokens that occur sufficiently frequently, and create either random or smarter word embeddings for them. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a token/embedding creation preprocessing pipeline using tf-transform #124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create a token/embedding creation preprocessing pipeline using tf-transform #124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions