-
Notifications
You must be signed in to change notification settings - Fork 80
Description
Hello,
I have tried to run ./ai2bmd ./chig.pdb to make sure your model gets setup properly and I manage to simulate the MD trajectory given the reference protein downloaded from
https://raw.githubusercontent.com/microsoft/AI2BMD/main/examples/chig.pdb
Now I want to run simulations on predicted monomer structures, for example I choose a rather short one (57 residues) from the AF2DB
https://alphafold.ebi.ac.uk/entry/H2QDC4
Following your readme and observations on how your example pdb files look like, I drafted the following script to prepare AF2 predictions for use in ai2bmd.
import os
from pymol import cmd
import sys
def prepare_pdb(input_pdb, output_prefix, n_residues, n_atoms, h_add=1):
# TODO: below are modifications applied to the PDB file according to the repo's README
# prepare protein with PyMol
cmd.load(input_pdb, "molecule")
if h_add:
output_prefix += "withH__"
cmd.h_add("molecule") # add hydrogens
cmd.save(f"{output_prefix}h_add.pdb")
else:
output_prefix += "withoutH__"
cmd.wizard("mutagenesis")
cmd.get_wizard().set_n_cap("acet") # add n-terminal cap
selection = "/%s//%s/%s" % ("molecule", "A", 1) # selection of N-term
cmd.get_wizard().do_select(selection)
cmd.get_wizard().apply()
cmd.save(f"{output_prefix}n_cap.pdb")
cmd.get_wizard().set_c_cap("nmet") # add c-terminal cap
selection = "/%s//%s/%s" % ("molecule", "A", n_residues) # selection of C-term
cmd.get_wizard().do_select(selection)
cmd.get_wizard().apply()
cmd.save(f"{output_prefix}pymol_prep.pdb")
# adjust atom names with Amber
os.system(f"pdb4amber -i {output_prefix}pymol_prep.pdb -o {output_prefix}amber_prep.pdb")
# ensure no TER separator
# (and residue numbering is from 1 without gaps since we start from an AF2 predicted structure..)
pdb_lines = []
with open(f"{output_prefix}amber_prep.pdb", 'r') as f:
for line in f:
if not line.startswith("TER"):
pdb_lines.append(line.strip())
else:
print(f"cleaning up TER line\n{line}")
with open(f"{output_prefix}noTER_prep.pdb", 'w') as f:
for line in pdb_lines:
f.write(f"{line}\n")
# TODO: below are additional modifications to the PDB file to be most similar to the example files..
# make sure all records are ATOM and not HETATM as in example files
pdb_lines = ["ATOM "+l[6:] if l.startswith("HETATM") else l for l in pdb_lines]
with open(f"{output_prefix}noHETATM_prep.pdb", 'w') as f:
for line in pdb_lines:
f.write(f"{line}\n")
# in the example files there is no chain identifier
pdb_lines = [l.replace(" A ", " ") if l.startswith("ATOM") else l for l in pdb_lines]
with open(f"{output_prefix}noCHAIN_prep.pdb", 'w') as f:
for line in pdb_lines:
f.write(f"{line}\n")
# in chig.pdb the header format is different (8 columns instead of 9 in our pdb file at this stage)
# replace with chig's header (or delete as in other example pdb files?)
new_header = "CRYST1 25.939 27.327 23.669 90.00 90.00 90.00 1"
print(f"\nreplacing header\n{pdb_lines[0]}\nwith header from reference CHIG file\n{new_header}")
pdb_lines[0] = new_header
with open(f"{output_prefix}chig_header.pdb", 'w') as f:
for line in pdb_lines:
f.write(f"{line}\n")
# in the example files there is "TER" between last "ATOM" and "END"
pdb_lines = (pdb_lines[:-1] +
[f"TER {n_atoms+1} NME {n_residues+2} "] +
[pdb_lines[-1]])
with open(f"{output_prefix}modified_TER.pdb", 'w') as f:
for line in pdb_lines:
f.write(f"{line}\n")
if __name__ == '__main__':
path_to_data = "/path_to_input_pdb_files"
# example structure for trying out https://alphafold.ebi.ac.uk/entry/H2QDC4
input_pdb = f"{path_to_data}/AF-H2QDC4-F1-model_v4.pdb"
output_prefix = f"{path_to_data}/AF-H2QDC4__"
n_residues = 57 # on purpose we take a short protein to try setting up AI2BMD
h_add = int(sys.argv[1])
if h_add:
n_atoms = 842
else:
n_atoms = 418
prepare_pdb(input_pdb, output_prefix, n_residues, n_atoms, h_add=h_add)
# python input_prep_cleaned.py 1 > input_prep_withH.txt
# ./ai2bmd --prot-file AF-H2QDC4__withH__modified_TER.pdb > ai2bmd_withH.txt
# FIXME: H atoms are causing errors except for last residue C-cap (<NME 59>)
# FATAL: Atom .R<ACE 1>.A<HH31 7> does not have a type.
# FATAL: Atom .R<ACE 1>.A<HH32 8> does not have a type.
# FATAL: Atom .R<ACE 1>.A<HH33 9> does not have a type.
# FATAL: Atom .R<MET 2>.A<2HB 18> does not have a type.
# FATAL: Atom .R<MET 2>.A<3HB 19> does not have a type.
# FATAL: Atom .R<MET 2>.A<2HG 20> does not have a type.
# FATAL: Atom .R<MET 2>.A<3HG 21> does not have a type.
# FATAL: Atom .R<MET 2>.A<1HE 22> does not have a type.
# FATAL: Atom .R<MET 2>.A<2HE 23> does not have a type.
# FATAL: Atom .R<MET 2>.A<3HE 24> does not have a type.
# ...
# FATAL: Atom .R<ILE 57>.A<H01 20> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H02 21> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H03 22> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H04 23> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H05 24> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H06 25> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H07 26> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H08 27> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H09 28> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H10 29> does not have a type.
# FATAL: Atom .R<ILE 57>.A<H13 30> does not have a type.
# FATAL: Atom .R<PRO 58>.A<2HB 15> does not have a type.
# FATAL: Atom .R<PRO 58>.A<3HB 16> does not have a type.
# FATAL: Atom .R<PRO 58>.A<2HG 17> does not have a type.
# FATAL: Atom .R<PRO 58>.A<3HG 18> does not have a type.
# FATAL: Atom .R<PRO 58>.A<2HD 19> does not have a type.
# FATAL: Atom .R<PRO 58>.A<3HD 20> does not have a type.
# /opt/conda/envs/ambertools/bin/teLeap: Fatal Error!
# Failed to generate parameters
# Exiting LEaP: Errors = 1; Warnings = 2; Notes = 0.
# python input_prep_cleaned.py 0 > input_prep_withoutH.txt
# ./ai2bmd --prot-file AF-H2QDC4__withoutH__modified_TER.pdb > ai2bmd_withoutH.txt
If I run the preparation with applying h_add then ./ai2bmd crashes almost instantly and no additional files are created along the input pdb file (AF-H2QDC4__withH__modified_TER.pdb). There seems to be an issue on how H are formatted in all residues besides the last C-cap residue. (see prepared pdb files in the next comment)
If I run preparation but skip h_add then ./ai2bmd is still running and has already generated additional files:
- AF-H2QDC4__withoutH__modified_TER.pdb
- AF-H2QDC4__withoutH__modified_TER1.inpcrd
- AF-H2QDC4__withoutH__modified_TER.inpcrd
- AF-H2QDC4__withoutH__modified_TER.top
- AF-H2QDC4__withoutH__modified_TER1.top
I am waiting to see whether the simulation will complete with input preparation without adding hydrogens.
Is it fine to run without? Or can anyone share insights on what I should fix in the preparation to successfully run simulation while adding hydrogen atoms?
Thanks!