feat(io): Enable input from FASTA files #16

roccomoretti · 2025-10-01T18:11:06Z

To maximize the ease of use of RF3, it would be nice to be able to directly load in FASTA files (such that you can directly predict from a FASTA of your system). This PR puts together a potential system to add loading of systems from FASTA files, basically making it syntactic sugar around the JSON input format (though less powerful).

The file format is inspired by Boltz's FASTA input format, but slightly more flexible. Most fields are optional, and it should be robust to "extra" information in the label line. (You should be able to input most arbitrary polymeric FASTA files as-is and have them work, albeit without MSA ... which is also easy enough to add.)

While limited to FASTA input currently, it's written with an eye to be flexible for additional sequence file input formats, as desire dictates.

This is intended as a "draft" PR, for comment & feedback.

Add the ability for RF3 to load in from FASTA files. The file format is inspired by Boltz's FASTA input format, but slightly more flexible. (You should be able to input a protein FASTA as-is and have it work, albeit without MSA.) It's written with an eye to be flexible for additional sequence file input formats, as desire dictactes. The FASTA input is basically just syntactic sugar around the JSON input format, with a reduced feature set.

…oject build: add pyproject.toml to make pip-installable

r-krishna · 2026-01-08T00:32:41Z

I'm guessing this is also stale -- sorry for not reviewing earlier... I think a super helpful thing along these lines would be automatically running MSA generation. What do you think?

roccomoretti · 2026-01-08T15:38:29Z

Assuming that this feature is something you'd be interested in including, I would certainly be willing to resolve conflicts & update for the current state of the code.

I agree that automatically running MSA generation would be helpful. That's something I've looked into (using the colabfold mmseqs integration code), though I unfortunately haven't had time to figure out how (e.g. where) to best integrate it into the atomworks/foundry codebase.

r-krishna · 2026-01-09T07:47:10Z

I think it makes sense as an addition as is. @nscorley have you thought about this / what the best place to do MSA generation is? Naively, I would make a transfrom in the rf3/src/rf3/data directory and add it to the pipeline as an inference transform but maybe Nate has other ideas.

roccomoretti · 2026-01-12T23:07:02Z

I'll note there's now an atomworks/src/atomworks/ml/preprocessing/msa/generating.py file, which would be the place to have the actual MSA generation, with a transform perhaps being the way to hook it into RF3 specifically.

r-krishna · 2026-01-12T23:13:13Z

Yeah that sounds reasonable to me, make a transform that iterates through the inference AtomArray and generates MSAs on the fly? maybe behind an inference flag eg rf3 fold ... --generate_msas?

roccomoretti added 5 commits October 1, 2025 11:45

docs: add documentation and example for FASTA file loading

21c6f96

tests: Add tests for FASTA loading

4f8599e

(fix): remove spuriously added print statement.

8fa2c2e

(fix): remove spurious null character from end of a3m file

2a708f5

LorenzoTarricone pushed a commit to LorenzoTarricone/foundry that referenced this pull request Dec 29, 2025

Merge pull request RosettaCommons#16 from baker-laboratory/build/pypr…

8f0d108

…oject build: add pyproject.toml to make pip-installable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(io): Enable input from FASTA files #16

feat(io): Enable input from FASTA files #16

roccomoretti commented Oct 1, 2025

Uh oh!

r-krishna commented Jan 8, 2026

Uh oh!

roccomoretti commented Jan 8, 2026

Uh oh!

r-krishna commented Jan 9, 2026

Uh oh!

roccomoretti commented Jan 12, 2026

Uh oh!

r-krishna commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(io): Enable input from FASTA files #16

Are you sure you want to change the base?

feat(io): Enable input from FASTA files #16

Conversation

roccomoretti commented Oct 1, 2025

Uh oh!

r-krishna commented Jan 8, 2026

Uh oh!

roccomoretti commented Jan 8, 2026

Uh oh!

r-krishna commented Jan 9, 2026

Uh oh!

roccomoretti commented Jan 12, 2026

Uh oh!

r-krishna commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants