Implement on-the-fly sequence packing for bytes decoder training #40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The bytes decoder flattens (B, L, T) inputs to (B×L, T) for training, padding each word to max length T. With B=128, L=512, T=32, this creates 65,536 sequences of 32 tokens each (2.1M tokens total), despite most words being 2-5 tokens long.
Changes
Core Implementation (
welt/model.py)_pack_sequences_for_decoding(): Greedily packs sequences until reachingmax_packed_length = T × 2, tracking indices for unpacking_unpack_logits(): Reconstructs original (B, L, T, vocab_size) shape from packed decoder outputsparallel_causal_decode(): Routes through packing pipeline transparentlyEdge Cases
Performance
Typical English text (simulated):
Short words (maximum benefit):
Long words (minimal benefit):
Testing
examples/demo_packing_efficiency.py) illustrates efficiency gains across word length distributionsNo changes required to training code—packing is automatic and preserves all model behavior.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
esm.ubuntu.com/usr/lib/apt/methods/https(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.