Skip to content

Conversation

@roccomoretti
Copy link
Member

To maximize the ease of use of RF3, it would be nice to be able to directly load in FASTA files (such that you can directly predict from a FASTA of your system). This PR puts together a potential system to add loading of systems from FASTA files, basically making it syntactic sugar around the JSON input format (though less powerful).

The file format is inspired by Boltz's FASTA input format, but slightly more flexible. Most fields are optional, and it should be robust to "extra" information in the label line. (You should be able to input most arbitrary polymeric FASTA files as-is and have them work, albeit without MSA ... which is also easy enough to add.)

While limited to FASTA input currently, it's written with an eye to be flexible for additional sequence file input formats, as desire dictates.

This is intended as a "draft" PR, for comment & feedback.

Add the ability for RF3 to load in from FASTA files.
The file format is inspired by Boltz's FASTA input format, but slightly more flexible.
(You should be able to input a protein FASTA as-is and have it work, albeit without MSA.)

It's written with an eye to be flexible for additional sequence file input formats, as desire dictactes.

The FASTA input is basically just syntactic sugar around the JSON input format, with a reduced feature set.
LorenzoTarricone pushed a commit to LorenzoTarricone/foundry that referenced this pull request Dec 29, 2025
…oject

build: add pyproject.toml to make pip-installable
@r-krishna
Copy link
Collaborator

I'm guessing this is also stale -- sorry for not reviewing earlier... I think a super helpful thing along these lines would be automatically running MSA generation. What do you think?

@roccomoretti
Copy link
Member Author

Assuming that this feature is something you'd be interested in including, I would certainly be willing to resolve conflicts & update for the current state of the code.

I agree that automatically running MSA generation would be helpful. That's something I've looked into (using the colabfold mmseqs integration code), though I unfortunately haven't had time to figure out how (e.g. where) to best integrate it into the atomworks/foundry codebase.

@r-krishna
Copy link
Collaborator

I think it makes sense as an addition as is. @nscorley have you thought about this / what the best place to do MSA generation is? Naively, I would make a transfrom in the rf3/src/rf3/data directory and add it to the pipeline as an inference transform but maybe Nate has other ideas.

@roccomoretti
Copy link
Member Author

I'll note there's now an atomworks/src/atomworks/ml/preprocessing/msa/generating.py file, which would be the place to have the actual MSA generation, with a transform perhaps being the way to hook it into RF3 specifically.

@r-krishna
Copy link
Collaborator

Yeah that sounds reasonable to me, make a transform that iterates through the inference AtomArray and generates MSAs on the fly? maybe behind an inference flag eg rf3 fold ... --generate_msas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants