-
Notifications
You must be signed in to change notification settings - Fork 4
Description
The last days I have played with the AMD HIP port of openMM on FAH test WUs from the 17102 test project on my Radeon VIIs. I have compared the ns/day results from the FAH benchmark with ns/day values from these systems run on openmm master (7.5) with HIP platform and openmm 7.4.2 according to the branch run in FAH core22.
That the results with platform HIP are different from those with platform openCL is logical. However I have also seen performance difference (of about 10%) in the comparision between the FAH reported ns/day values and the ns/day values on a local run of the system in openMM.
For example RUN10:
- FAH benchmark results 15ns/day
- openMM HIP 13.2ns/day
- openMM OpenCL 12.2ns/day
or example RUN13:
- FAH benchmark results 51ns/day
- openMM HIP 47.1ns/day
- openMM OpenCL 42.9ns/day
So it seems the results on openMM openCL are 10%-20% lower than these on FAH, which I don't understand. I would expect the opposite, since the runs on FAH include also checkpoints.
Would be good to understand the differences and achieve similar results in execution of the runs on openCL in FAH and openCL on local openMM. As long as there are significant differences effective benchmarks are not possible before integration of a new approach into a new FAH core, which is a big effort. Being able to run benchmarks in advance directly in openMM would be helpful to analyse performance effects of different changes.
This is the script I used, derived from the script to generate the 17101 (and probably 17102) test WUs:
from simtk import openmm, unit
import time
import os
template = """
<config>
<numSteps v="{numSteps}"/>
<xtcFreq v="{xtcFreq}"/>
<checkpointFreq v="{checkpointFreq}"/>
<precision v="mixed"/>
<xtcAtoms v="solute"/>
</config>
"""
nsteps = 50000
wu_duration = 10*unit.minutes
ncheckpoints_per_wu = 4
from glob import glob
runs = glob('RUNS/17102*')
runs.sort()
platform = openmm.Platform.getPlatformByName('HIP')
print(platform.getOpenMMVersion())
platform.setPropertyDefaultValue('Precision', 'mixed')
def load(run, filename):
with open(os.path.join(run, filename), 'rt') as infile:
return openmm.XmlSerializer.deserialize(infile.read())
for run in runs:
run = run + "/01/"
print(run)
# Read core.xml
coredata = dict()
coredata['checkpointFreq'] = 0 #int(nsteps_per_wu / ncheckpoints_per_wu)
coredata['numSteps'] = 0 #ncheckpoints_per_wu * coredata['checkpointFreq']
coredata['xtcFreq'] = 0 #coredata['numSteps']
system = load(run, 'system.xml')
state = load(run, 'state.xml')
integrator = load(run, 'integrator.xml')
context = openmm.Context(system, integrator, platform)
context.setState(state)
initial_time = time.time()
integrator.step(nsteps)
state = context.getState()
elapsed_time = (time.time() - initial_time) * unit.seconds
time_per_step = elapsed_time / nsteps
ns_per_day = (nsteps * integrator.getStepSize()) / elapsed_time / (unit.nanoseconds/unit.day)
nsteps_per_wu = int(wu_duration / time_per_step)
print(f'{run} {system.getNumParticles()} particles : {ns_per_day:.1f} ns/day : {coredata}')