Releases: ROCm/rocWMMA
rocWMMA 2.2.0 for ROCm 7.2.0
Added
- Added sample
perf_i8gemmto demonstrateint8_tas matrix input data type - Added support for the gfx1150 target
Changed
- Removed unnecessary const keyword to avoid compiler warnings
- rocWMMA has been moved into the new rocm-libraries "monorepo" repository (https://github.com/ROCm/rocm-libraries). This repository consolidates a number of separate ROCm libraries and shared components.
- The repository migration requires a few changes to the CMake configuration of rocWMMA
- The repository migration rquired the GTest dependency to be updated to v1.16.0
Resolved issues
- Skip invalid test configurations when using 'register file' LDS mapping
- Ensured transform functions in samples are only available on the device
rocWMMA 2.1.0 for ROCm 7.1.1
rocWMMA code for ROCm 7.1.1 did not change. The library was rebuilt for the updated ROCm 7.1.1 stack.
rocWMMA 2.0.0 for ROCm 7.0.2
rocWMMA code for ROCm 7.0.2 did not change. The library was rebuilt for the updated ROCm 7.0.2 stack.
rocWMMA 2.0.0 for ROCm 7.1.0
rocWMMA code for ROCm 7.1.0 did not change. The library was rebuilt for the updated ROCm 7.1.0 stack.
rocWMMA 2.0.0 for ROCm 7.0.1
rocWMMA code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.
rocWMMA 2.0.0 for ROCm 7.0.0
Added
- Added internal register layout transforms to support interleaved MMA layouts
- Added support for the gfx950 target
- Added mixed input
bf8/fp8types for MMA support - Added fragment scheduler API objects to embed thread block cooperation properties in fragments
Changed
- Augmented load / store / MMA internals with static loop unrolling
- rocWMMA mma_sync API now supports
wave tilefragment sizes - rocWMMA cooperative fragments are now expressed with fragment scheduler template arguments
- rocWMMA cooperative fragments now use the same base API as non-cooperative fragments
- rocWMMA cooperative fragments register usage footprint has been reduced
- rocWMMA fragments now support partial tile sizes with padding
Optimized
- Added internal flow control barriers to improve assembly code generation and overall performance
- Enabled interleaved layouts by default in MMA to improve overall performance
Removed
- Removed support for the gfx940 and gfx941 targets
- Removed the rocWMMA cooperative API
- Removed wave count template parameters from transforms APIs
Resolved issues
- Fixed a validation issue for small precision compute types
< B32on gfx9 - Fixed CMake validation of compiler support for
bf8/fp8types - Fixed linkage of rocwmma::synchronize_workgroup to inline
rocWMMA 1.7.0 for ROCm 6.4.4
rocWMMA code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.
rocWMMA 1.7.0 for ROCm 6.4.3
rocWMMA code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.
rocWMMA 1.7.0 for ROCm 6.4.2
rocWMMA code for ROCm 6.4.2 did not change. The library was rebuilt for the updated ROCm 6.4.2 stack.
rocWMMA 1.7.0 for ROCm 6.4.1
rocWMMA code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.