-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the Machine_Learning_Benchmark wiki!
Objective:
- We aim to construct a SVR based formalism to predict non-bonded interaction energy in clusters and condensed phases.
- We develop this method based on a combination of SVR and many body expansion (MBE) of interaction energies.
- We have tested the scheme of exact SVR predictions for water dimer and trimer energies in case of rigid water molecules.
- We have employed this formalism to compute the interaction energies of decamers of rigid water molecules and checked the accuracy of predictions against the QM estimates. Refer to our paper: "Machine Learning Prediction of Interaction Energy in Rigid Water Clusters" in PCCP.
Starting point:
The configuration space of water decamer cluster is obtained from two separate classical NVT simulations at 100K and 300K,
using Gromacs software. In this work we are examining the rigid clusters only.
Equally spaced 452 snapshots of water decamer are taken from the combined NVT simulation trajectory.
This gro file is the starting point of the codes: "Q-chem_inputs_script_dimer_number-ordered.py" and "Q-chem_inputs_script_trimer_with-dimer_number-ordered.py".
Input descriptors generation:
In order to use SVR and MBE to predict interaction energies of rigid clusters, one needs to train the two and three body interaction terms. The input descriptor needs to be a suitable function of the positions of the atoms or distances between them. We generate the all possible dimer and trimer configurations from the water decamer structures using the codes "Q-chem_inputs_script_dimer_number-ordered.py" and "Q-chem_inputs_script_trimer_with-dimer_number-ordered.py". The dimer and trimer configurations are set up for Q-Chem job for the prediction of BSSE-corrected interaction energy.
The codes that generate the two body and three body datasets from the Qchem outputs are "Generate-MLinputs_csv_twobody_3sorted_9Inv_dist.py" and "Generate-MLinputs_csv_threebody_3sorted_27Inv_dist.py". There are three kinds of interatomic distances, O-O, O-H and H-H. These different types of distance reciprocals are sorted separately. For dimer, there are 9 such cases which can be divided into 1 O-O distance reciprocal, 4 O-H distance reciprocals and 4H-H distance reciprocals. The schematic representation of this input descriptor set is illustrated in Fig.1 in our paper. Similarly, in case of trimer, there are total 27 reciprocal distances of which 3 are for O-O, 12 are for O-H and 12 are for H-H distances, respectively. Each of these reciprocal distances are then sorted separately to form the descriptors. The output is the two body and three body energies.
Training and testing the data-sets:
Codes "EE" and "FF" carry out the cross-validation of the two and three body data sets. The SVR parameters are optimized using these two codes. These parameters were later used for the prediction of interaction energy as outputs in unknown decamer structures. The codes to predict the two body and three body energies of unknown decamer structures are "GG" and "HH".