Merged
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds unoptimized aie2p (NPU2) support for matrix multiplication (MAT_MUL) operations by:
- Moving matrix multiplication functionality from
mat_mul.pytogemm.py - Adding aie2p-specific kernel implementations with 512-bit vector widths
- Updating build configuration to support both aie2 and aie2p architectures
Key Changes
- Consolidated matrix multiplication logic into
gemm.pywith architecture-specific parameter handling for both aie2 and aie2p - Added new aie2p kernel files (
mm.ccandzero.cc) with larger MMUL dimensions optimized for aie2p architecture - Updated CMake build system to install aie2p kernel files alongside aie2
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ggml-hsa/kernels/mat_mul.py | Removed - functionality moved to gemm.py |
| src/ggml-hsa/kernels/gemm.py | Added matrix multiplication functions supporting both aie2 and aie2p with architecture-specific block sizes |
| src/ggml-hsa/kernels/build.py | Updated MUL_MAT kernel to use gemm.py instead of mat_mul.py |
| src/ggml-hsa/kernels/aie2p/zero.cc | New file implementing zero initialization for 512-bit vectors on aie2p |
| src/ggml-hsa/kernels/aie2p/mm.cc | New file implementing matrix multiplication kernels optimized for aie2p with 2x2 MMUL expansion |
| src/ggml-hsa/kernels/aie2/zero.cc | Updated copyright headers and formatting to match project standards |
| src/ggml-hsa/kernels/aie2/mm.cc | Updated copyright headers, formatting, and added rounding mode configuration for bfloat16 |
| src/ggml-hsa/kernels/aie2/init.py | Removed Python package marker file |
| src/ggml-hsa/kernels/CMakeLists.txt | Updated to install gemm.py and aie2p kernel files; removed mat_mul.py and aie2/init.py |
Comments suppressed due to low confidence (4)
src/ggml-hsa/kernels/gemm.py:766
- Inconsistent function naming between aie2 and aie2p. The function
my_matmulis called on line 766 but it's not defined anywhere in the visible code. This should be imported or defined. Based on the context, it appears this should be calling a matmul function similar to how aie2 importsmat_mul.my_matmul.
src/ggml-hsa/kernels/gemm.py:657 - The docstring return value description is inaccurate. The function returns a tuple of 7 values (m, n, k, use_scalar, num_cols, zero_fn, matmul_fn), but the docstring only lists 6 items and doesn't mention
num_cols. Themm_fnandzero_fnnames in the docstring should also bematmul_fnandzero_fnto match the actual return values.
src/ggml-hsa/kernels/aie2/init.py:1 - The removal of
aie2/__init__.pymay cause import issues if this module is imported as a package elsewhere in the codebase. Even if the file only contains a copyright notice, it's needed to mark the directory as a Python package. Consider keeping an__init__.pyfile, even if it's empty or minimal.
src/ggml-hsa/kernels/gemm.py:671 - Inconsistent block sizes between architectures. For aie2p, the block sizes are set to
m=16, n=16, k=16(lines 669-671), while aie2 usesm=8, n=8, k=8(lines 664-666). However, the aie2p kernel shapes defined in mm.cc use different micro-kernel dimensions (e.g., 4x4x8 for i16, 8x8x8 for i8, etc.), which may not be compatible with 16x16x16 blocks. Please verify that the block sizes align with the actual MMUL kernel implementations to avoid runtime errors.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/ggml-hsa/kernels/gemm.py:656
- The return value documentation in the docstring mentions "mm_fn" and "zero_fn" in the list but the actual return statement returns "zero_fn" and "matmul_fn". The documentation should be updated to match the actual return values:
zero_fnandmatmul_fn.
src/ggml-hsa/kernels/gemm.py:766 - The function
my_matmulis called but not imported or defined in this file. The deletedmat_mul.pyfile previously imported this fromaie2.mat_mul, but that import path no longer exists. This function needs to be either defined in this file or imported from the correct location.
ypapadop-amd
added a commit
that referenced
this pull request
Dec 9, 2025
ypapadop-amd
added a commit
that referenced
this pull request
Dec 12, 2025
ypapadop-amd
added a commit
that referenced
this pull request
Dec 15, 2025
ypapadop-amd
added a commit
that referenced
this pull request
Jan 7, 2026
ypapadop-amd
added a commit
that referenced
this pull request
Jan 26, 2026
ypapadop-amd
added a commit
that referenced
this pull request
Feb 10, 2026
ypapadop-amd
added a commit
that referenced
this pull request
Feb 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds (unoptimized) aie2p support for MAT_MUL.