Contains benchmarks to be run for NLR's ESIF-HPC-4 procurement.
The purpose of this draft release is so that we can make our RFP benchmarking plans transparent to all vendors ahead of the RFP. Our hope is that this early draft release will give vendors additional time to work with our team on the benchmarks, especially as we have a few "in-house" codes represented in the suite that may be unfamiliar to vendors.
This early draft release does not represent or guarantee any final form of the suite.
Important Notes:
- This is an in-progress draft release.
- Different benchmarks in the suite are at various states of "in-progress"
- Most benchmarks do not have finalized inputs or run requirements
- Please see the Planned Changes section of this README for changes that we are planning to make/are in development, but have not yet integrated into this repo.
- Benchmarks are divided into "Class A" and "Class B".
- "Class A" - Performance-required benchmarks: set of benchmarks for which specific performance targets must be met or exceeded.
- "Class B" - Functionality benchmarks: set of benchmarks intended to demonstrate and baseline the functionality, scalability, and software readiness of specific workloads or system features, but no specific performance level will be required.
- The official version of the benchmark suite will be provided with the RFP.
- Until the official release, we may add or subtract benchmarks, change run requirements, etc.
"Class A" Applications:
| Application | Standard | Accelerated | Optimized | Baseline |
|---|---|---|---|---|
| VASP | Optional | Yes | Optional | Yes |
| WRF | Yes | No | Optional | Yes |
| MLPerf-DeepCAM | Optional | Yes | Optional | Yes |
| AMR-Wind | Optional | Yes | Optional | Yes |
| LAMMPS | Yes | Yes | Optional | Yes |
| BerkeleyGW | Optional | Yes | Optional | Yes |
Please note that while specific benchmark READMEs may include instructions and reference results for both CPU-only and accelerated hardware, for all application benchmarks except for LAMMPS, results are requested from only one of CPU-only or accelerated hardware, as designated in the above table. Results from the non-requested hardware type may be optionally provided.
"Class B" Applications - functionality only
| Application | Standard | Accelerated | Optimized | Baseline |
|---|---|---|---|---|
| Sienna | Yes | No | No | Yes |
Microbenchmarks:
| Application | Standard | Accelerated | Optimized | Baseline |
|---|---|---|---|---|
| OSU | Yes | Yes | Optional | Yes |
| HPL | Yes | Yes | Optional | Yes |
| Stream | Yes | Yes | Optional | Yes |
| IOR | Yes | No | Optional | Yes |
| mdtest | Yes | No | Optional | Yes |
| GPU-GPU collective | No | Yes | Optional | Yes |
| FIO* | Yes | No | Optional | Yes |
* benchmark still in early development; not yet in repo.
We have established the following draft definitions for baseline, ported, and optimized runs. These broad "run rules" will apply to all benchmarks, with any exceptions noted in the corresponding benchmark's README. Runs will be categorized according to the following three (draft) categories:
-
Baseline (as-is): no code modifications permitted. Library substitutions permitted if these libraries will be available to us at the time of machine arrival. Changes to compilation options generally permitted (some edge cases exist. For example, stream’s compilation option to use custom functions in place of the ones in the stream source would not be allowed)
-
Ported: only source code modifications necessary to port the code to the new architecture are permitted, in addition to allowed baseline changes. This would include addition or modification of directives or pragmas, and/or replacement of existing architecture-specific language constructs (e.g., CUDA <-> HIP) with another well-documented language or interface. Ported should not be reported without baseline, unless baseline is not possible. Changes must be minimal and reproducible.
-
Optimized: in addition to what is allowed for baseline and ported, additional source code changes are permitted under the condition that these changes are made available in a maintainable form by the time of machine arrival. For each benchmark, newer versions of the benchmark source code may be used if these versions are publicly available at the time of machine arrival. Using surrogate models is not permitted. Floating point precision-related optimizations are not allowed unless specifically stated otherwise in the corresponding benchmark README.md.
-
A baseline result is required whenever possible. A ported result may be provided in place of a baseline result if the baseline result is not possible. Ported in addition to baseline is optional and optimized is fully optional.
We have planned/upcoming changes to the suite that have not yet been integrated but are currently in development. We list any major not-yet-integrated changes here. Please note that this list is subject to change, and we make no guarantee that these changes are reflected in the finalized benchmark suite.
- The Sienna benchmark will be pared down into two functionality runs only.
- Better clarified that most application benchmarks now request results for one of CPU-only or accelerated nodes, rather than both, though both may be optionally provided.
- VASP: Bench 1 will now focus only on the HSE calculation (removing the GGA and GW components), with the supercell increased from 16 atoms to 128 atoms. Bench 2 will be a vasp_gam single-kpoint GGA calculation with 1149 atoms, increased from 519 atoms.
- Changed the AI application-level benchmark from MLPerf's 3DUnet to MLPerf's DeepCAM benchmark.
- Removed the AceCAST/GPU portion of WRF, along with any requests for simultaneous/concurrent runs on test hardware.
- Removed 12 km input case from WRF
- Overhauled AMR-Wind benchmark, simplifying and clarifying build instructions, inputs, and run requirements, and removed any requests for simultaneous/concurrent runs on test hardware for AMR-Wind
- Added "extra large" size input to LAMMPS that should better utilize future hardware; removed any requirement to run "small" and "large" LAMMPS input sizes.
- Removed HPGMG from the suite
- Added "planned changes" section to README
- Added draft definitions for baseline/ported/optimized runs to README
- Removed Q-Chem from the suite
- Moved BerkeleyGW from "Class B" to "Class A"