Releases: amd/amd-fftw
Releases · amd/amd-fftw
AOCL-FFTW 5.2
No major change
AOCL-FFTW 5.1
- Minor build issue fixes
AOCL-FFTW 5.0
Highlights of this release
- Support added for using the wisdom feature by default under the –enable-amd-app-opt option
- Minor bug fixes
AOCL-FFTW 4.2
Merge 4.2 release branch into amd-fftw
AOCL-FFTW 4.1
Highlights of this release
- Dynamic dispatch support added for AOCC build of the library on Linux
- Minor bug fixes
AOCL FFTW version 4.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- AVX-512 enablement of DFT kernels
- AVX-512 optimization of copy and transpose routines
AOCL FFTW version 3.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- Dynamic dispatcher for AOCL-FFTW
- Upgraded AOCL-FFTW to align with the reference FFTW 3.3.10 from MIT
- Windows FFTW features aligned with Linux FFTW
AMD Optimized FFTW version 3.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- Feature ‘AMD application optimization layer’ that uplifts the performance of HPC and scientific applications
- Feature ‘Fast MPI transpose algorithm’ to speed up the distributed MPI FFT computations
- Feature ‘Top N planner’ that minimizes single-threaded run-to-run variations
- Support for building AMD FFTW library on Windows
- GCC compilation support for AMD processors based on the AMD “Zen3” core architecture
AMD Optimized FFTW version 3.0.1
AMD Optimized FFTW version 3.0.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- A new planner feature called Top N planner is introduced that minimizes single-threaded run-to-run variations.
- New parallel MPI transpose algorithm enabled via configure option "--enable-amd-mpi-vader-limit"
- When using this configure option, the user needs to set --mca btl_vader_eager_limit appropriately (current preference is 65536) in the MPIRUN command.
AMD Optimized FFTW version 3.0
AMD Optimized FFTW version 3.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- New fast planner that improves the time of various planning modes in general and OPATIENT mode in particular. It can be enabled through configure option “–enable-amd-fast-planner”
- Support for configure option “AMD_ARCH” to help cross compilation. It can take various options like auto/znver1/znver2/znver3 for AMD EPYC processors
- Quad precision support is now included for AOCC clang compiler from version 10 onwards
- Improved handling of –enable-debug and “CC” options by ‘configure’ when –enable-amd-opt is used
- Fixed the wrong behavior of OWISDOM feature in the absence of wisdom file