-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
Do you have any benchmarks showing where the extra overhead of mx.matmul over a regular matmul is? Is it in the quantization step (calculating scales, rounding, etc.)? If so, do you know if devices with MX support will do this rounding in the hardware itself, and if so, will the overhead become negligible there because of the hardware support?
Metadata
Metadata
Assignees
Labels
No labels