[New Model] PoET-2 for DMS Zero-Shot Benchmarks#88
Closed
timt51 wants to merge 1 commit intoOATML-Markslab:mainfrom
Closed
[New Model] PoET-2 for DMS Zero-Shot Benchmarks#88timt51 wants to merge 1 commit intoOATML-Markslab:mainfrom
timt51 wants to merge 1 commit intoOATML-Markslab:mainfrom
Conversation
23d3a2d to
430955f
Compare
Contributor
Author
|
Closing in favor of #89, which includes code for both DMS and clinical unsupervised benchmarks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a new baseline model, PoET-2, for the following benchmarks: DMS substitutions and DMS indels.
To download the model weights and MSAs used by the model to make predictions, run the following:
This will save the model weights in the directory
~/.cache/ProteinGym/baselines/PoET-2and MSAs in the directory~/.cache/ProteinGym/baselines/PoET. Note that the MSAs are the same as those used for PoET(-1).To make predictions, run
scripts/scoring_DMS_zero_shot/scoring_PoET_2_substitutions.shfor the substitutions benchmark orscripts/scoring_DMS_zero_shot/scoring_PoET_2_indels.shfor the indels benchmarkPredictions will be saved to the directories
${DMS_output_score_folder_subs}PoET-2and${DMS_output_score_folder_indels}PoET-2for the substitutions and indels benchmarks respectively.These scripts will download ~12GB of predicted structures from AlphaFoldDB and save them in the directory
${PROTEINGYM_CACHE}/baselines/PoET-2/DMS_AF2_structures_cache. The download may take awhile. If you would like to perform the download step first without performing the model inference step, you can run the scripts with the environment variableSAMPLE_PROMPTS_ONLY=1. When this environment variable is set, GPUs will not be utilized, but an NVIDIA GPU must still be present on the system in order for the imports to run without failure.Lastly, note that the scripts will try to utilize all GPUs that are available to them.
Using the provided scripts for computing model performance on the DMS benchmarks, I obtained an average Spearman correlation of
0.500for the DMS substitutions benchmark and0.573for the DMS indels benchmark.Precomputed PoET-2 predictions can be downloaded at the following links: