-
Notifications
You must be signed in to change notification settings - Fork 319
Description
When running a metadynamics simulation with multiple walkers, each walker will create its own HILLS file. To combine them using sum_hills, I would call it like such:
plumed sum_hills --hills PATHTOMYHILLSFILE1,PATHTOMYHILLSFILE2,PATHTOMYHILLSFILE3
However, sum_hills will then integrate the hills files one after the other (as per the documentation). This is fine when we are interested in the final result, but when we want to look at convergence measurements with the --stride or --nohistory options, the resulting "blocks" do not correspond to the time evolution of the output (unless the walkers are strictly ran sequentially).
A workaround for this is to first parse the HILLS files, use the clock field that is already included in the HILLS files, sort all of them, re-write the data to a new file and pass this to sum_hills.
Sample code is provided below if anyone comes around this issue:
import io
from glob import glob
import pandas as pd
hills_df = list()
header = str()
for i_f, hills_file in enumerate(glob('./HILLS/HILLS.*')):
with open(hills_file, 'r') as f:
for line in f:
if i_f == 0 and line.startswith('#'):
header += line
if line.startswith("#! FIELDS"):
line = line.strip()
columns = line.split("#! FIELDS")[1].split()
df = pd.read_table(
hills_file,
names=columns,
comment='#',
sep='\s+',
index_col=0
)
hills_df.append(df)
outfile = './hills_sorted.csv'
csv_buffer = io.StringIO()
sorted_hills_df = pd.concat(hills_df).sort_values(by='clock').reset_index(drop=True)
sorted_hills_df.to_csv(csv_buffer, sep='\t', header=False)
with open(outfile, 'w') as f:
f.write(header)
f.write(csv_buffer.getvalue())However, it would be much preferred if sum_hills could do this step on its own (sort hills by clock before integrating). Maybe with an option --sort_clock or similar.
Note that this has already been brought up here https://groups.google.com/g/plumed-users/c/Lw-Yc6H94eQ