Skip to content

Conversation

@JudeWells
Copy link
Collaborator

This PR strips back the repo to the core functionality required for the preprint version of the model.

Only (sequence only) text memmap datasets are supported for training.

Inference is supported with the following entry-point scripts:
scripts/generate_sequences.py
scripts/score_sequences.py

Training configs are provided for reproducing the preprint version of the model and a training config using example data and a recommended training config for training a model using the profam atlas dataset.

Copy link
Owner

@alex-hh alex-hh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge whenever you're happy and go from there (I copied current master branch to a new dev branch for now).

@JudeWells JudeWells merged commit ca473e3 into master Dec 18, 2025
2 checks passed
@alex-hh alex-hh deleted the simplify branch December 18, 2025 20:30
alex-hh pushed a commit that referenced this pull request Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants