Skip to content

aie2p MAT_MUL#150

Merged
ypapadop-amd merged 7 commits intohsa-backendfrom
matmul-strix
Nov 17, 2025
Merged

aie2p MAT_MUL#150
ypapadop-amd merged 7 commits intohsa-backendfrom
matmul-strix

Conversation

@ypapadop-amd
Copy link
Owner

This PR adds (unoptimized) aie2p support for MAT_MUL.

@ypapadop-amd ypapadop-amd self-assigned this Nov 17, 2025
Copilot AI review requested due to automatic review settings November 17, 2025 21:13
@ypapadop-amd ypapadop-amd changed the base branch from master to hsa-backend November 17, 2025 21:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds unoptimized aie2p (NPU2) support for matrix multiplication (MAT_MUL) operations by:

  • Moving matrix multiplication functionality from mat_mul.py to gemm.py
  • Adding aie2p-specific kernel implementations with 512-bit vector widths
  • Updating build configuration to support both aie2 and aie2p architectures

Key Changes

  • Consolidated matrix multiplication logic into gemm.py with architecture-specific parameter handling for both aie2 and aie2p
  • Added new aie2p kernel files (mm.cc and zero.cc) with larger MMUL dimensions optimized for aie2p architecture
  • Updated CMake build system to install aie2p kernel files alongside aie2

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/ggml-hsa/kernels/mat_mul.py Removed - functionality moved to gemm.py
src/ggml-hsa/kernels/gemm.py Added matrix multiplication functions supporting both aie2 and aie2p with architecture-specific block sizes
src/ggml-hsa/kernels/build.py Updated MUL_MAT kernel to use gemm.py instead of mat_mul.py
src/ggml-hsa/kernels/aie2p/zero.cc New file implementing zero initialization for 512-bit vectors on aie2p
src/ggml-hsa/kernels/aie2p/mm.cc New file implementing matrix multiplication kernels optimized for aie2p with 2x2 MMUL expansion
src/ggml-hsa/kernels/aie2/zero.cc Updated copyright headers and formatting to match project standards
src/ggml-hsa/kernels/aie2/mm.cc Updated copyright headers, formatting, and added rounding mode configuration for bfloat16
src/ggml-hsa/kernels/aie2/init.py Removed Python package marker file
src/ggml-hsa/kernels/CMakeLists.txt Updated to install gemm.py and aie2p kernel files; removed mat_mul.py and aie2/init.py
Comments suppressed due to low confidence (4)

src/ggml-hsa/kernels/gemm.py:766

  • Inconsistent function naming between aie2 and aie2p. The function my_matmul is called on line 766 but it's not defined anywhere in the visible code. This should be imported or defined. Based on the context, it appears this should be calling a matmul function similar to how aie2 imports mat_mul.my_matmul.
    src/ggml-hsa/kernels/gemm.py:657
  • The docstring return value description is inaccurate. The function returns a tuple of 7 values (m, n, k, use_scalar, num_cols, zero_fn, matmul_fn), but the docstring only lists 6 items and doesn't mention num_cols. The mm_fn and zero_fn names in the docstring should also be matmul_fn and zero_fn to match the actual return values.
    src/ggml-hsa/kernels/aie2/init.py:1
  • The removal of aie2/__init__.py may cause import issues if this module is imported as a package elsewhere in the codebase. Even if the file only contains a copyright notice, it's needed to mark the directory as a Python package. Consider keeping an __init__.py file, even if it's empty or minimal.
    src/ggml-hsa/kernels/gemm.py:671
  • Inconsistent block sizes between architectures. For aie2p, the block sizes are set to m=16, n=16, k=16 (lines 669-671), while aie2 uses m=8, n=8, k=8 (lines 664-666). However, the aie2p kernel shapes defined in mm.cc use different micro-kernel dimensions (e.g., 4x4x8 for i16, 8x8x8 for i8, etc.), which may not be compatible with 16x16x16 blocks. Please verify that the block sizes align with the actual MMUL kernel implementations to avoid runtime errors.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/ggml-hsa/kernels/gemm.py:656

  • The return value documentation in the docstring mentions "mm_fn" and "zero_fn" in the list but the actual return statement returns "zero_fn" and "matmul_fn". The documentation should be updated to match the actual return values: zero_fn and matmul_fn.
    src/ggml-hsa/kernels/gemm.py:766
  • The function my_matmul is called but not imported or defined in this file. The deleted mat_mul.py file previously imported this from aie2.mat_mul, but that import path no longer exists. This function needs to be either defined in this file or imported from the correct location.

@ypapadop-amd ypapadop-amd merged commit c4bfc13 into hsa-backend Nov 17, 2025
6 checks passed
@ypapadop-amd ypapadop-amd deleted the matmul-strix branch November 17, 2025 21:35
ypapadop-amd added a commit that referenced this pull request Dec 9, 2025
ypapadop-amd added a commit that referenced this pull request Dec 12, 2025
ypapadop-amd added a commit that referenced this pull request Dec 15, 2025
ypapadop-amd added a commit that referenced this pull request Jan 7, 2026
ypapadop-amd added a commit that referenced this pull request Jan 26, 2026
ypapadop-amd added a commit that referenced this pull request Feb 10, 2026
ypapadop-amd added a commit that referenced this pull request Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants