Skip to content

[ENHANCEMENT] Add language modelling setup#38

Merged
fabian-sp merged 6 commits intomainfrom
f-add-gpt
Jul 23, 2025
Merged

[ENHANCEMENT] Add language modelling setup#38
fabian-sp merged 6 commits intomainfrom
f-add-gpt

Conversation

@fabian-sp
Copy link
Owner

@fabian-sp fabian-sp commented Jun 27, 2025

Adds:

  • Llama style model and Shakespeare dataset
  • Custom DataLoader that shifts a sequence of tokens into targets and labels

Comments:

  • deactivated torch.compile for now
  • for DataLoader we add collate function to be compatible with the previously used format of a batch (dict of data, targets, ind)

@fabian-sp
Copy link
Owner Author

fabian-sp commented Jul 22, 2025

TODO:

  • Add test config and output for Shakespeare dataset
  • Add Shakespeare raw data files
  • Add SPP for linear models

@fabian-sp fabian-sp merged commit eebfd3a into main Jul 23, 2025
1 check passed
@fabian-sp fabian-sp deleted the f-add-gpt branch July 23, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant