Skip to content

Releases: ROCm/rocWMMA

rocWMMA 2.2.0 for ROCm 7.2.0

21 Jan 18:58

Choose a tag to compare

Added

  • Added sample perf_i8gemm to demonstrate int8_t as matrix input data type
  • Added support for the gfx1150 target

Changed

  • Removed unnecessary const keyword to avoid compiler warnings
  • rocWMMA has been moved into the new rocm-libraries "monorepo" repository (https://github.com/ROCm/rocm-libraries). This repository consolidates a number of separate ROCm libraries and shared components.
    • The repository migration requires a few changes to the CMake configuration of rocWMMA
    • The repository migration rquired the GTest dependency to be updated to v1.16.0

Resolved issues

  • Skip invalid test configurations when using 'register file' LDS mapping
  • Ensured transform functions in samples are only available on the device

rocWMMA 2.1.0 for ROCm 7.1.1

26 Nov 06:42
1ab208f

Choose a tag to compare

rocWMMA code for ROCm 7.1.1 did not change. The library was rebuilt for the updated ROCm 7.1.1 stack.

rocWMMA 2.0.0 for ROCm 7.0.2

10 Oct 12:09
b5f06e6

Choose a tag to compare

rocWMMA code for ROCm 7.0.2 did not change. The library was rebuilt for the updated ROCm 7.0.2 stack.

rocWMMA 2.0.0 for ROCm 7.1.0

30 Oct 05:22
27a847f

Choose a tag to compare

rocWMMA code for ROCm 7.1.0 did not change. The library was rebuilt for the updated ROCm 7.1.0 stack.

rocWMMA 2.0.0 for ROCm 7.0.1

17 Sep 16:41
2445fb2

Choose a tag to compare

rocWMMA code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.

rocWMMA 2.0.0 for ROCm 7.0.0

16 Sep 06:37
2445fb2

Choose a tag to compare

Added

  • Added internal register layout transforms to support interleaved MMA layouts
  • Added support for the gfx950 target
  • Added mixed input bf8 / fp8 types for MMA support
  • Added fragment scheduler API objects to embed thread block cooperation properties in fragments

Changed

  • Augmented load / store / MMA internals with static loop unrolling
  • rocWMMA mma_sync API now supports wave tile fragment sizes
  • rocWMMA cooperative fragments are now expressed with fragment scheduler template arguments
  • rocWMMA cooperative fragments now use the same base API as non-cooperative fragments
  • rocWMMA cooperative fragments register usage footprint has been reduced
  • rocWMMA fragments now support partial tile sizes with padding

Optimized

  • Added internal flow control barriers to improve assembly code generation and overall performance
  • Enabled interleaved layouts by default in MMA to improve overall performance

Removed

  • Removed support for the gfx940 and gfx941 targets
  • Removed the rocWMMA cooperative API
  • Removed wave count template parameters from transforms APIs

Resolved issues

  • Fixed a validation issue for small precision compute types < B32 on gfx9
  • Fixed CMake validation of compiler support for bf8 / fp8 types
  • Fixed linkage of rocwmma::synchronize_workgroup to inline

rocWMMA 1.7.0 for ROCm 6.4.4

24 Sep 14:02
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.

rocWMMA 1.7.0 for ROCm 6.4.3

07 Aug 14:20
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.

rocWMMA 1.7.0 for ROCm 6.4.2

21 Jul 16:54
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.2 did not change. The library was rebuilt for the updated ROCm 6.4.2 stack.

rocWMMA 1.7.0 for ROCm 6.4.1

20 May 13:16
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.