A model for protein sequence prediction given backbone coordinates
All Python dependencies are specified in environment.yml.
mamba env create -f environment.yamlCUDA_VISIBLE_DEVICES=0 python sample.py \
--pdb_path <path_to_pdb> \
--ckpt_path ./weights/mpnn_dataset/pdb_weights.pt \
--num_predictions 8 \
--output_root_dir ./scratch/--pdb_path: Path to a single input PDB file--pdb_dir: Path to directory containing multiple PDB files to process--json_path: Path to JSON file containing list of PDB file paths--ckpt_path: Path to model checkpoint file (default:./weights/afdb_dataset/afdb_weights.pt)--device: Device to run on, e.g., 'cuda:0' or 'cpu' (default:cuda:0)
--output_root_dir: Root directory for output files (default:./)- Creates subdirectories:
backbones/(PDB files) andseqs/(FASTA files)
- Creates subdirectories:
--exclude_colon: Exclude ":" separator between chains in output sequences
--num_predictions: Number of predictions per input structure (default:8)--temp: Temperature parameter for sampling (default:0.1)--noise_std: Standard deviation of Gaussian noise to add to coordinates (default:0.0)--half_half: Run half predictions with noise_std=0 and half with noise_std=0.2
--chain_condition: Condition on specific chain (e.g., 'A', 'B')--res_condition: Space-separated list of residue indices to condition on (e.g.,--res_condition 1 2 3 10 15)
--omit_AA: Space-separated list of amino acid one-letter codes to exclude from sampling (e.g.,--omit_AA C M)
--tied_weights: Use tied weights during prediction--cfg: Run classifier-free guidance--sample_purity: Enable purity sampling--partial_flows: Run partial flow matching--t: Forward diffusion time for partial flows (default:0.5)
Process a single PDB file:
python sample.py \
--pdb_path ./examples/6zht.pdb \
--ckpt_path ./weights/mpnn_dataset/pdb_weights.pt \
--num_predictions 8 \
--output_root_dir ./output/Process directory of PDB files:
python sample.py \
--pdb_dir ./examples/ \
--ckpt_path ./weights/mpnn_dataset/pdb_weights.pt \
--num_predictions 8 \
--output_root_dir ./output/Keep chain A fixed and redesign all other residues using a higher sampling temperature (0.3), excluding cysteines, adding backbone noise with a standard deviation of 0.2, and generating 8 sequence samples:
python sample.py \
--pdb_path ./examples/6zht.pdb \
--chain_condition A \
--temp 0.3 \
--omit_AA C \
--noise_std 0.2 \
--num_predictions 8 \
--output_root_dir ./output/Process multiple PDB files using JSON list (proteinmpnn style):
python sample.py \
--json_path ./examples/files.json \
--ckpt_path ./weights/mpnn_dataset/pdb_weights.pt \
--num_predictions 8 \
--output_root_dir ./output/Generate an interactive HTML summary of your predictions:
python generate_summary.py output/This creates output/summary.html with:
- Interactive structure viewer using py2Dmol
- Multiple sequence generation visualization
- PSSM heatmap and sequence logo plots
This repository is a modified version of OpenFold and incorporates components from MultiFlow, ProteinMPNN, Protenix, and py2Dmol
For any questions and concerns feel free to submit an issue

