-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
The quantization support I've added through --low-prec-bytes-per-val is a bit barebones. It'd be nice to add enough flexibility to handle per-block quantization (e.g. some only quantize the linears to int4) and some of the new formats that aren't a multiple of a byte (e.g. int4, fp6, etc)
Relevant: #36
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels