Skip to content

Train bytes decoder with on-the-fly packing #25

@AmitMY

Description

@AmitMY

The bytes decoder is trained with the sequences up to length N being padded.
So for a batch with B samples and L words we create a batch of (BxL) words and decode those.
This number can be very large - 128 batch size of maximum 512 words for example creates a 65536 words batch size for the decoder.

Using pack_sequence, and some engineering, we can probably pack these words into fewer byte sequences, taking short words and joining them into a single training sample (for example, if we have the word "a" it is like 2 tokens, but the word "hello" is like 6, so we could fit 3 "a"s in the same training sample)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions