Feat: Progressive sequence-length Curriculum Learning in the Dataloader

### Problem
The current data loader in `prepare.py` strictly packs documents to `MAX_SEQ_LEN` (2048). Training on full context lengths from step 0 is often computationally wasteful and slows down early convergence.

### Proposal
Modify the dataloader to support sequence-length warmup (Curriculum Learning). For example, start training with a sequence length of 256 for the first 10% of the training budget, and progressively double it until reaching 2048. This requires updating the `make_dataloader` logic to dynamically resize the sequence packing on the fly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Progressive sequence-length Curriculum Learning in the Dataloader #121

Problem

Proposal

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feat: Progressive sequence-length Curriculum Learning in the Dataloader #121

Description

Problem

Proposal

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions