-
Notifications
You must be signed in to change notification settings - Fork 93
feat(io): Enable input from FASTA files #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: production
Are you sure you want to change the base?
feat(io): Enable input from FASTA files #16
Conversation
Add the ability for RF3 to load in from FASTA files. The file format is inspired by Boltz's FASTA input format, but slightly more flexible. (You should be able to input a protein FASTA as-is and have it work, albeit without MSA.) It's written with an eye to be flexible for additional sequence file input formats, as desire dictactes. The FASTA input is basically just syntactic sugar around the JSON input format, with a reduced feature set.
…oject build: add pyproject.toml to make pip-installable
|
I'm guessing this is also stale -- sorry for not reviewing earlier... I think a super helpful thing along these lines would be automatically running MSA generation. What do you think? |
|
Assuming that this feature is something you'd be interested in including, I would certainly be willing to resolve conflicts & update for the current state of the code. I agree that automatically running MSA generation would be helpful. That's something I've looked into (using the colabfold mmseqs integration code), though I unfortunately haven't had time to figure out how (e.g. where) to best integrate it into the atomworks/foundry codebase. |
|
I think it makes sense as an addition as is. @nscorley have you thought about this / what the best place to do MSA generation is? Naively, I would make a transfrom in the rf3/src/rf3/data directory and add it to the pipeline as an inference transform but maybe Nate has other ideas. |
|
I'll note there's now an |
|
Yeah that sounds reasonable to me, make a transform that iterates through the inference AtomArray and generates MSAs on the fly? maybe behind an inference flag eg |
To maximize the ease of use of RF3, it would be nice to be able to directly load in FASTA files (such that you can directly predict from a FASTA of your system). This PR puts together a potential system to add loading of systems from FASTA files, basically making it syntactic sugar around the JSON input format (though less powerful).
The file format is inspired by Boltz's FASTA input format, but slightly more flexible. Most fields are optional, and it should be robust to "extra" information in the label line. (You should be able to input most arbitrary polymeric FASTA files as-is and have them work, albeit without MSA ... which is also easy enough to add.)
While limited to FASTA input currently, it's written with an eye to be flexible for additional sequence file input formats, as desire dictates.
This is intended as a "draft" PR, for comment & feedback.