Skip to content

Enable auto-vectorization of add/mul reduction loops on NEON hardware#63

Draft
raneashay wants to merge 1 commit intomicrosoft:mainfrom
raneashay:ashay/improve-auto-vectorization
Draft

Enable auto-vectorization of add/mul reduction loops on NEON hardware#63
raneashay wants to merge 1 commit intomicrosoft:mainfrom
raneashay:ashay/improve-auto-vectorization

Conversation

@raneashay
Copy link
Copy Markdown

@raneashay raneashay commented Mar 23, 2026

On NEON, this patch enables auto-vectorization of sum and product
reduction loops, thus enabling vectorization of several BLAS functions.
In particular, this patch adds strict-order NEON reduction instructions
for {Add|Mul}ReductionV{F|D} operations. Prior to this change,
match_rule_supported_auto_vectorization() blocked these operations,
preventing vectorization of reduction loops that are common in dot
products, matrix-vector multiplications, and matrix-matrix
multiplications. Additionally, this patch also adds UseSVE guards to
existing SVE reduction predicates so that they're not matched on
NEON-only hardware.

@raneashay raneashay force-pushed the ashay/improve-auto-vectorization branch from 5ca717d to 6c4c75d Compare March 23, 2026 23:23
@raneashay raneashay changed the title Enable auto-vectorization of BLAS kernels on NEON hardware Enable auto-vectorization of add/mul reduction loops on NEON hardware Mar 23, 2026
On NEON, this patch enables auto-vectorization of sum and product
reduction loops, thus enabling vectorization of several BLAS functions.
In particular, this patch adds strict-order NEON reduction instructions
for `{Add|Mul}ReductionV{F|D}` operations.  Prior to this change,
`match_rule_supported_auto_vectorization()` blocked these operations,
preventing vectorization of reduction loops that are common in dot
products, matrix-vector multiplications, and matrix-matrix
multiplications.  Additionally, this patch also adds UseSVE guards to
existing SVE reduction predicates so that they're not matched on
NEON-only hardware.
@raneashay raneashay force-pushed the ashay/improve-auto-vectorization branch from 6c4c75d to a3491b4 Compare March 24, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant