ggml-cpu: add q4_0 repack support for wasm by aviallon · Pull Request #18858 · ggml-org/llama.cpp

aviallon · 2026-01-15T09:59:01Z

Add LLM written WASM simd128 implementations for ggml_quantize_mat_q8_0_4x4, ggml_quantize_mat_q8_0_4x8 and ggml_gemv_q4_0_4x4_q8_0, ggml_gemm_q4_0_4x4_q8_0, ggml_gemv_q4_0_8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0.
Tested from a custom Wllama.

aviallon · 2026-01-15T10:01:15Z

@ngxson this may be of interest to you. Note: it is required to modify wllama's build slightly to use this, as emscripten overrides CMAKE_SYSTEM_PROCESSOR.
Needed options:

set(GGML_CPU_REPACK ON CACHE BOOL "enable ggml CPU repack optimizations")
set(LLAMA_WASM_MEM64 OFF CACHE BOOL "disable MEMORY64 for wllama")

emcmake cmake -DEMSCRIPTEN_SYSTEM_PROCESSOR=wasm …

ngxson · 2026-01-15T11:04:57Z

If I understand correctly, this requires MEM64 to be disabled, right?

CC @reeselevine for visibility

reeselevine · 2026-01-15T16:54:51Z

Interesting, looks like the change in ngxson/wllama@492c423 to move the flags to CMakeLists.txt is what caused the 64-bit build to be enabled by default, since LLAMA_WASM_MEM64 is on by default in llama.cpp right now. So the cache directive is needed to override it, unless it's specified on the command line.

I don't think the simd implementations in this PR would require disabling 64-bit generally, right? It's just that right now, wllama doesn't support the 64-bit builds yet.

reeselevine · 2026-01-15T16:56:52Z

Or actually, I realize ngxson/wllama#200 doesn't include that flag, because the WebGPU integration PR hasn't been merged yet. Maybe the cache directive actually isn't needed?

aviallon · 2026-01-16T21:47:39Z

Interesting, looks like the change in ngxson/wllama@492c423 to move the flags to CMakeLists.txt is what caused the 64-bit build to be enabled by default, since LLAMA_WASM_MEM64 is on by default in llama.cpp right now. So the cache directive is needed to override it, unless it's specified on the command line.

I don't think the simd implementations in this PR would require disabling 64-bit generally, right? It's just that right now, wllama doesn't support the 64-bit builds yet.

I actually tried building with 64-bit support enabled, and got errors even when running with node.js directly.

aviallon · 2026-01-22T00:07:18Z

For the record, with that + the llama.cpp version bump and -ffast-math -fno-finite-math-only, I get ~50% faster PP compared to current wllama.

loci-dev mentioned this pull request Jan 15, 2026

UPSTREAM PR #18858: ggml-cpu: add q4_0 repack support for wasm auroralabs-loci/llama.cpp#930

Open

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jan 15, 2026

ggml-cpu: add q4_0 repack support for wasm

a73b9d3

aviallon force-pushed the feat/wasm-repack branch from b03291a to a73b9d3 Compare January 15, 2026 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cpu: add q4_0 repack support for wasm#18858

ggml-cpu: add q4_0 repack support for wasm#18858
aviallon wants to merge 1 commit intoggml-org:masterfrom
aviallon:feat/wasm-repack

aviallon commented Jan 15, 2026

Uh oh!

aviallon commented Jan 15, 2026

Uh oh!

ngxson commented Jan 15, 2026

Uh oh!

reeselevine commented Jan 15, 2026

Uh oh!

reeselevine commented Jan 15, 2026

Uh oh!

aviallon commented Jan 16, 2026

Uh oh!

aviallon commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aviallon commented Jan 15, 2026

Uh oh!

aviallon commented Jan 15, 2026

Uh oh!

ngxson commented Jan 15, 2026

Uh oh!

reeselevine commented Jan 15, 2026

Uh oh!

reeselevine commented Jan 15, 2026

Uh oh!

aviallon commented Jan 16, 2026

Uh oh!

aviallon commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants