ggml-cpu: add q4_0 repack support for wasm#18858
ggml-cpu: add q4_0 repack support for wasm#18858aviallon wants to merge 1 commit intoggml-org:masterfrom
Conversation
|
@ngxson this may be of interest to you. Note: it is required to modify wllama's build slightly to use this, as emscripten overrides set(GGML_CPU_REPACK ON CACHE BOOL "enable ggml CPU repack optimizations")
set(LLAMA_WASM_MEM64 OFF CACHE BOOL "disable MEMORY64 for wllama")
|
|
If I understand correctly, this requires MEM64 to be disabled, right? CC @reeselevine for visibility |
b03291a to
a73b9d3
Compare
|
Interesting, looks like the change in ngxson/wllama@492c423 to move the flags to CMakeLists.txt is what caused the 64-bit build to be enabled by default, since I don't think the simd implementations in this PR would require disabling 64-bit generally, right? It's just that right now, wllama doesn't support the 64-bit builds yet. |
|
Or actually, I realize ngxson/wllama#200 doesn't include that flag, because the WebGPU integration PR hasn't been merged yet. Maybe the cache directive actually isn't needed? |
I actually tried building with 64-bit support enabled, and got errors even when running with node.js directly. |
|
For the record, with that + the llama.cpp version bump and |
Add LLM written WASM simd128 implementations for
ggml_quantize_mat_q8_0_4x4,ggml_quantize_mat_q8_0_4x8andggml_gemv_q4_0_4x4_q8_0,ggml_gemm_q4_0_4x4_q8_0,ggml_gemv_q4_0_8x8_q8_0andggml_gemm_q4_0_8x8_q8_0.Tested from a custom Wllama.