Different performance from test WU run on FAH and on openMM

The last days I have played with the AMD HIP port of openMM on FAH test WUs from the 17102 test project on my Radeon VIIs. I have compared the ns/day results from the FAH benchmark with ns/day values from these systems run on openmm master (7.5) with HIP platform and openmm 7.4.2 according to the branch run in FAH core22.

That the results with platform HIP are different from those with platform openCL is logical. However I have also seen performance difference (of about 10%) in the comparision between the FAH reported ns/day values and the ns/day values on a local run of the system in openMM.

For example RUN10:
- FAH benchmark results 15ns/day
- openMM HIP 13.2ns/day
- openMM OpenCL 12.2ns/day

or example RUN13:
- FAH benchmark results 51ns/day
- openMM HIP 47.1ns/day
- openMM OpenCL 42.9ns/day

So it seems the results on openMM openCL are 10%-20% lower than these on FAH, which I don't understand. I would expect the opposite, since the runs on FAH include also checkpoints.

Would be good to understand the differences and achieve similar results in execution of the runs on openCL in FAH and openCL on local openMM. As long as there are significant differences effective benchmarks are not possible before integration of a new approach into a new FAH core, which is a big effort. Being able to run benchmarks in advance directly in openMM would be helpful to analyse performance effects of different changes.

This is the script I used, derived from the script to generate the 17101 (and probably 17102) test WUs:

```
from simtk import openmm, unit
import time
import os

template = """
<config>
 <numSteps v="{numSteps}"/>  
 <xtcFreq v="{xtcFreq}"/>
 <checkpointFreq v="{checkpointFreq}"/>
 <precision v="mixed"/>
 <xtcAtoms v="solute"/> 
</config>

"""

nsteps = 50000
wu_duration = 10*unit.minutes
ncheckpoints_per_wu = 4

from glob import glob
runs = glob('RUNS/17102*')
runs.sort()

platform = openmm.Platform.getPlatformByName('HIP')
print(platform.getOpenMMVersion())
platform.setPropertyDefaultValue('Precision', 'mixed')

def load(run, filename):    
    with open(os.path.join(run, filename), 'rt') as infile:        
        return openmm.XmlSerializer.deserialize(infile.read())

for run in runs:
    run = run + "/01/"
    print(run)

    # Read core.xml
    coredata = dict()    
    coredata['checkpointFreq'] = 0 #int(nsteps_per_wu / ncheckpoints_per_wu)
    coredata['numSteps'] = 0 #ncheckpoints_per_wu * coredata['checkpointFreq']
    coredata['xtcFreq'] = 0 #coredata['numSteps']

    system = load(run, 'system.xml')
    state = load(run, 'state.xml')
    integrator = load(run, 'integrator.xml')
    
    context = openmm.Context(system, integrator, platform)
    context.setState(state)
    
    initial_time = time.time()
    integrator.step(nsteps)
    state = context.getState()
    elapsed_time = (time.time() - initial_time) * unit.seconds
    time_per_step = elapsed_time / nsteps
    ns_per_day = (nsteps * integrator.getStepSize()) / elapsed_time / (unit.nanoseconds/unit.day)
    nsteps_per_wu = int(wu_duration / time_per_step)
    
    print(f'{run} {system.getNumParticles()} particles : {ns_per_day:.1f} ns/day : {coredata}')

```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different performance from test WU run on FAH and on openMM #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Different performance from test WU run on FAH and on openMM #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions