WebGPU Integration (continued) by reeselevine · Pull Request #201 · ngxson/wllama

reeselevine · 2026-01-15T16:39:45Z

See #198 for original discussion

Summary by CodeRabbit

New Features
- WebGPU hardware acceleration with a prefer-WebGPU option and visible "WebGPU" runtime indicator
- Real-time performance metrics display (prefill/decode tok/s) and a Reset metrics action
- Public perf APIs to fetch/reset performance data from the client
Chores
- Automated deploy workflow for examples to GitHub Pages
- Updated dev tooling, types and Emscripten SDK versions and dependency bumps

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…awnwebgpu port, removing remote port file from repo

…stigation

coderabbitai · 2026-01-15T16:40:00Z

Note

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

📝 Walkthrough

Walkthrough

Adds WebGPU selectable backend support and perf-context APIs, bumps GLUE protocol to v2, updates WASM/build tooling and TypeScript types, exposes WebGPU and perf status in the frontend, and wires new perf actions through the native worker bridge.

Changes

Cohort / File(s)	Summary
Build & Deployment `./.github/workflows/deploy-examples-main.yml`, `CMakeLists.txt`, `scripts/build_wasm.sh`, `scripts/docker-compose.yml`	New GH Actions workflow for deploying examples; added CMake options `GGML_WEBGPU`, `GGML_WEBGPU_JSPI`, `LLAMA_WASM_MEM64`; bumped EMSDK image tag; consolidated shared EMCC flags.
C++ Protocol & Runtime `cpp/glue.hpp`, `cpp/actions.hpp`, `cpp/wllama.cpp`, `cpp/test_glue.cpp`	GLUE_VERSION -> 2; `load_req` adds `use_webgpu` and `no_perf`; new perf messages (`pctx_req/pctx_res`, `prst_req/prst_res`); added `device` field to `app_t`; backend selection (WebGPU/CPU) and two new actions `action_perf_context`, `action_perf_reset`.
TypeScript Protocol `src/glue/messages.ts`	GLUE_VERSION -> 2; `load_req` gains `use_webgpu` and `no_perf`; added perf message prototypes and corresponding TypeScript interfaces; expanded `GlueMsg` union.
Core TypeScript / API `src/wllama.ts`	New config flags `preferWebGPU?` and `noPerf?`; runtime `useWebGPU` tracking and `usingWebGPU()` accessor; `getPerfContext()` and `resetPerfContext()` APIs; load payloads include WebGPU/no-perf settings; threading logic adjusted when WebGPU active.
Worker / WASM Bridge `src/workers-code/llama-cpp.js`	`callWrapper` gains `isAsync` support; cwrap wrappers updated to support async calls for `wllama_start` and `wllama_action`; wrapper signatures and invocation flow adjusted.
Frontend UI & Context `examples/main/src/components/ChatScreen.tsx`, `examples/main/src/components/ModelScreen.tsx`, `examples/main/src/utils/types.ts`, `examples/main/src/utils/wllama.context.tsx`	Added perf state and controls in ChatScreen (fetch/reset perf); ModelScreen shows WebGPU status; `RuntimeInfo` adds `usingWebGPU`; context initializes `preferWebGPU: true` and updates `usingWebGPU`.
TypeScript Build & Dependencies `tsconfig.build.json`, `examples/main/tsconfig.app.json`, `package.json`, `examples/main/package.json`	Added `@webgpu/types` to tsconfig `types`; added `@webgpu/types` devDependency; bumped `@huggingface/jinja` in examples.
Frontend Build Files `examples/main/package.json`, `examples/main/tsconfig.app.json`	Type definitions and dev deps updated to support WebGPU types.
Submodule `llama.cpp`	Subproject commit pointer updated.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Browser Client
    participant Wllama as Wllama (TS)
    participant Worker as Worker/WASM
    participant Backend as GGML Backend

    Client->>Wllama: loadModel(preferWebGPU: true)
    Wllama->>Wllama: Check for WebGPU adapter
    alt adapter found
        Wllama->>Wllama: set useWebGPU = true
    else no adapter
        Wllama->>Wllama: set useWebGPU = false (warn)
    end
    Wllama->>Worker: wllama_start(loadReq with use_webgpu, no_perf, n_gpu_layers)
    Worker->>Backend: initialize (select WebGPU or CPU)
    Backend-->>Worker: initialized
    Worker-->>Wllama: ready
    Wllama-->>Client: model loaded (usingWebGPU status)

    Client->>Wllama: getPerfContext()
    Wllama->>Worker: wllama_action(pctx_req)
    Worker->>Backend: collect perf metrics
    Backend-->>Worker: perf data
    Worker-->>Wllama: pctx_res (metrics)
    Wllama-->>Client: PerfContextData

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped in code with WebGPU delight,

Tucked perf metrics snug and tight.
Protocol bumped, the wasm took flight,
Threads bowed out when GPU shone bright.
A rabbit cheers — the demo's light! 🎉

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'WebGPU Integration (continued)' is partially related to the changeset. While WebGPU integration is a significant part of the changes, the title is vague and overly broad—it doesn't specify what aspect of WebGPU integration is being implemented, and it obscures other important changes like performance monitoring, GitHub Actions workflow, CMake configuration, and async handling improvements.	Consider a more specific title that highlights the primary focus of the changeset, such as 'Add WebGPU backend selection, performance monitoring, and async support' or 'Implement WebGPU backend with perf context APIs'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

cpp/glue.hpp (2)
449-452: Missing break statement causes unintended fallthrough.

The GLUE_DTYPE_RAW case is missing a break statement, causing execution to fall through to GLUE_DTYPE_ARRAY_BOOL. This will corrupt deserialization of raw fields.
🐛 Fix missing break statement
     case GLUE_DTYPE_RAW:
       ((glue_raw *)field)->parse(input);
+      break;
     case GLUE_DTYPE_ARRAY_BOOL:
464-466: Missing break statement at end of switch.

The GLUE_DTYPE_ARRAY_RAW case is missing a break statement. While this is the last case and won't cause functional issues currently, it's inconsistent with other cases and could cause problems if more cases are added later.
🐛 Fix missing break statement
     case GLUE_DTYPE_ARRAY_RAW:
       ((glue_arr_raw *)field)->parse(input);
+      break;
     }

🤖 Fix all issues with AI agents

In @.github/workflows/deploy-examples-main.yml:
- Around line 1-36: Replace the invalid pinned SHA for the GitHub Pages deploy
action in the deploy step (the uses: JamesIves/github-pages-deploy-action@...)
with a valid tag or release (e.g., JamesIves/github-pages-deploy-action@v4.6.1)
to ensure the action can be resolved; also add a top-level concurrency block
(e.g., concurrency: { group: 'deploy-pages', cancel-in-progress: true }) to the
workflow to prevent overlapping runs when the workflow is triggered multiple
times.

In `@CMakeLists.txt`:
- Around line 4-6: The GGML_WEBGPU_JSPI option currently defaults to ON which
breaks compatibility for many browsers; change the CMake option declaration for
GGML_WEBGPU_JSPI to default to OFF (i.e., set its default value to OFF) and
update the surrounding CMake comment or project documentation to explicitly
state JSPI is experimental and list the browser requirements/flags (e.g.,
Chrome/Edge experimental flags, Firefox 139 experimental support, Safari/WebKit
not supported) so consumers know how to enable it intentionally.

🧹 Nitpick comments (4)

scripts/docker-compose.yml (1)
22-22: Comprehensive EMCC flags for JSPI and WebGPU support.

The consolidated flags correctly enable JSPI (JavaScript Promise Integration), SIMD, modern exception handling, and expose necessary runtime methods including HEAPU8 for direct memory access. The JSPI_EXPORTS configuration (wllama_start, wllama_action) correctly targets the async-capable functions.

Note: JSPI is experimental/provisional in Emscripten 4.0.20. Runtime behavior depends on browser support—Chrome/V8 ships it behind origin trials, and some versions require explicit feature flags to enable. Ensure your deployment targets compatible browser versions.

The single-line flag definition is difficult to read and maintain. Consider breaking it into a multi-line format for clarity:
♻️ Suggested readability improvement
-        export SHARED_EMCC_FLAGS="--no-entry -O3 -msimd128 -DNDEBUG -flto=full -frtti -fwasm-exceptions -sEXPORT_ALL=1 -sEXPORT_ES6=0 -sMODULARIZE=0 -sINITIAL_MEMORY=128MB -sMAXIMUM_MEMORY=4096MB -sALLOW_MEMORY_GROWTH=1 -sFORCE_FILESYSTEM=1 -sEXPORTED_FUNCTIONS=_main,_wllama_malloc,_wllama_start,_wllama_action,_wllama_exit,_wllama_debug -sEXPORTED_RUNTIME_METHODS=ccall,cwrap,HEAPU8 -sNO_EXIT_RUNTIME=1 -Wno-unused-command-line-argument -sJSPI -sJSPI_EXPORTS=['wllama_start','wllama_action']"
+        export SHARED_EMCC_FLAGS="\
+          --no-entry -O3 -msimd128 -DNDEBUG -flto=full -frtti -fwasm-exceptions \
+          -sEXPORT_ALL=1 -sEXPORT_ES6=0 -sMODULARIZE=0 \
+          -sINITIAL_MEMORY=128MB -sMAXIMUM_MEMORY=4096MB -sALLOW_MEMORY_GROWTH=1 \
+          -sFORCE_FILESYSTEM=1 \
+          -sEXPORTED_FUNCTIONS=_main,_wllama_malloc,_wllama_start,_wllama_action,_wllama_exit,_wllama_debug \
+          -sEXPORTED_RUNTIME_METHODS=ccall,cwrap,HEAPU8 \
+          -sNO_EXIT_RUNTIME=1 -Wno-unused-command-line-argument \
+          -sJSPI -sJSPI_EXPORTS=['wllama_start','wllama_action']"
examples/main/src/components/ChatScreen.tsx (1)
51-61: Consider guarding refreshPerf with loadedModel check.

The resetPerf function guards against !loadedModel, but refreshPerf does not. If refreshPerf is called when no model is loaded (e.g., during edge-case race conditions), getWllamaInstance().getPerfContext() might throw an error that's not specific to the missing model state.
♻️ Suggested improvement
 const refreshPerf = async () => {
+   if (!loadedModel) return;
   setPerfBusy(true);
   setPerfError(null);
   try {
     setPerfData(await getWllamaInstance().getPerfContext());
   } catch (e) {
     setPerfError((e as any)?.message ?? 'Failed to fetch perf data');
   } finally {
     setPerfBusy(false);
   }
 };
src/wllama.ts (2)
1382-1393: Type inconsistency between return type and actual value.

getPerfContext() is declared to return PerfContextData, but it actually returns GlueMsgPerfContextRes which includes the _name: "pctx_res" field. While this works at runtime (extra fields are allowed), it's a type inconsistency.

Looking at the PerfContextData interface (lines 134-143), it matches GlueMsgPerfContextRes except for _name. Consider either:

Stripping _name from the return value, or

Extending PerfContextData from GlueMsgPerfContextRes
♻️ Option 1: Strip _name from return
 async getPerfContext(): Promise<PerfContextData> {
   this.checkModelLoaded();
-   return await this.proxy.wllamaAction<GlueMsgPerfContextRes>(
+   const result = await this.proxy.wllamaAction<GlueMsgPerfContextRes>(
     'perf_context',
     {
       _name: 'pctx_req',
     }
   );
+   const { _name, ...perfData } = result;
+   return perfData;
 }
587-595: Use async adapter check for robust WebGPU detection.

The current check navigator.gpu only verifies the WebGPU API exists, not whether a compatible adapter is available. According to WebGPU best practices, await navigator.gpu?.requestAdapter() should be called to confirm a usable adapter exists; requestAdapter() resolves to null when the browser cannot provide a suitable adapter.

Since loadModel is already async, this check can be awaited without architectural changes.
♻️ More robust WebGPU detection
-   if (this.config.preferWebGPU) {
-     if (navigator.gpu) {
-       this.useWebGPU = true;
-     } else {
-       this.logger().warn(
-         'WebGPU backend requested but WebGPU is not available, falling back to CPU'
-       );
-     }
-   }
+   if (this.config.preferWebGPU) {
+     try {
+       const adapter = await navigator.gpu?.requestAdapter();
+       if (adapter) {
+         this.useWebGPU = true;
+       } else {
+         this.logger().warn(
+           'WebGPU backend requested but no compatible adapter found, falling back to CPU'
+         );
+       }
+     } catch {
+       this.logger().warn(
+         'WebGPU backend requested but WebGPU is not available, falling back to CPU'
+       );
+     }
+   }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8778d7b and a8a9546.

⛔ Files ignored due to path filters (5)

examples/main/package-lock.json is excluded by !**/package-lock.json
package-lock.json is excluded by !**/package-lock.json
src/multi-thread/wllama.wasm is excluded by !**/*.wasm
src/single-thread/wllama.wasm is excluded by !**/*.wasm
src/webgpu-single-thread/wllama.wasm is excluded by !**/*.wasm

📒 Files selected for processing (24)

.github/workflows/deploy-examples-main.yml
CMakeLists.txt
cpp/actions.hpp
cpp/glue.hpp
cpp/test_glue.cpp
cpp/wllama.cpp
examples/main/package.json
examples/main/src/components/ChatScreen.tsx
examples/main/src/components/ModelScreen.tsx
examples/main/src/utils/types.ts
examples/main/src/utils/wllama.context.tsx
examples/main/tsconfig.app.json
llama.cpp
package.json
scripts/build_wasm.sh
scripts/docker-compose.yml
src/glue/messages.ts
src/multi-thread/wllama.js
src/single-thread/wllama.js
src/webgpu-single-thread/wllama.js
src/wllama.ts
src/workers-code/generated.ts
src/workers-code/llama-cpp.js
tsconfig.build.json

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-04-18T08:24:14.434Z

Learnt from: ngxson
Repo: ngxson/wllama PR: 0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

Applied to files:

scripts/build_wasm.sh
src/wllama.ts
package.json

📚 Learning: 2025-12-29T03:44:02.649Z

Learnt from: reeselevine
Repo: ngxson/wllama PR: 198
File: src/workers-code/llama-cpp.js:269-279
Timestamp: 2025-12-29T03:44:02.649Z
Learning: In wllama's src/workers-code/llama-cpp.js, the isAsync parameter in callWrapper should only be true for functions included in JSPI_EXPORTS build configuration (wllama_start and wllama_action). Other functions like wllama_exit, wllama_debug, and wllama_malloc should use isAsync=false even though the callWrapper returns an async function, because they don't perform async operations at the C/WASM level.

Applied to files:

cpp/wllama.cpp
src/wllama.ts

📚 Learning: 2025-12-29T03:44:00.085Z

Learnt from: reeselevine
Repo: ngxson/wllama PR: 198
File: src/workers-code/llama-cpp.js:269-279
Timestamp: 2025-12-29T03:44:00.085Z
Learning: In src/workers-code/llama-cpp.js, constrain the isAsync flag passed to callWrapper: set isAsync to true only for functions included in the JSPI_EXPORTS build (specifically wllama_start and wllama_action). For other exported functions such as wllama_exit, wllama_debug, and wllama_malloc, always use isAsync=false since they do not perform async operations at the C/WASM level, even though callWrapper may expose an async function. This should be verifiable by reviewing the build exports and ensuring these functions are not marked as asynchronous in the wrapper when invoked.

Applied to files:

src/workers-code/llama-cpp.js

🧬 Code graph analysis (5)

examples/main/src/components/ChatScreen.tsx (1)

src/wllama.ts (1)

PerfContextData (134-143)

examples/main/src/utils/wllama.context.tsx (2)

src/wllama.ts (1)

Wllama (298-1473)

examples/main/src/config.ts (1)

WLLAMA_CONFIG_PATHS (9-12)

src/wllama.ts (1)

src/glue/messages.ts (2)

GlueMsgPerfContextRes (1384-1394)

GlueMsgPerfResetRes (1402-1405)

cpp/glue.hpp (1)

src/glue/messages.ts (1)

GLUE_VERSION (6-6)

src/workers-code/llama-cpp.js (2)

cpp/generate_glue_prototype.js (1)

name (39-39)

src/worker.ts (4)

wllamaStart (119-127)

wllamaAction (129-141)

wllamaExit (143-153)

wllamaDebug (155-162)

🔇 Additional comments (29)

src/workers-code/llama-cpp.js (2)

230-247: LGTM!

The callWrapper function correctly handles the isAsync flag:

Passes { async: true } to cwrap only when needed for JSPI support

The conditional await inside the wrapper ensures synchronous functions execute without unnecessary promise wrapping at the native level

Error handling is appropriate

269-283: LGTM! The isAsync flags are correctly applied.

Based on learnings, isAsync=true should only be set for functions included in the JSPI_EXPORTS build (wllama_start and wllama_action). The configuration here correctly reflects this:

Function isAsync Correct

wllama_malloc false ✓

wllama_start true ✓

wllama_action true ✓

wllama_exit false ✓

wllama_debug false ✓

examples/main/package.json (2)

30-30: Appropriate addition for WebGPU type support.

Adding @webgpu/types as a devDependency is correct for providing TypeScript type definitions for WebGPU APIs. This aligns with the tsconfig changes that include this in the types array.

16-16: No action required. The version bump from ^0.2.2 to ^0.5.3 has no documented breaking changes. The v0.5.x release adds formatting support for chat templates (additive changes), and the example app already has a workaround in place for DeepSeek template compatibility issues.

package.json (1)

51-51: LGTM!

Adding @webgpu/types at the root level is necessary for the library's TypeScript compilation. The version is consistent with the example app's devDependency.

llama.cpp (1)

1-1: Verify submodule commit and review upstream changes before merging.

Ensure the new commit exists in the upstream llama.cpp repository and review changes between 4abef75f2cf2eee75eb5083b30a94cf981587394 and 9e41884dce4719d8d901058613ea7a700dc3697d for breaking API changes, security implications, or new dependencies that might affect the integration with this project's WebGPU implementation.

examples/main/tsconfig.app.json (1)

8-8: Note: types array restricts automatic type inclusion.

When the types array is specified, only the listed packages are included. If other @types/* packages are needed (e.g., @types/node), they must be explicitly added here. If this restriction is intentional for strict type control, this is fine.

scripts/docker-compose.yml (2)

27-28: LGTM - Single-thread build configuration.

Correctly derives flags from the shared base.

36-37: LGTM - Multi-thread build with pthread configuration.

The pthread flags are correctly appended, and PTHREAD_POOL_SIZE=Module["pthreadPoolSize"] allows runtime configuration of the thread pool size.

tsconfig.build.json (1)

30-31: LGTM - WebGPU types and synthetic imports enabled.

Both additions are appropriate:

allowSyntheticDefaultImports enables cleaner import syntax for CommonJS interop

types: ["@webgpu/types"] provides WebGPU type definitions for the library build

@webgpu/types is properly listed in devDependencies.

scripts/build_wasm.sh (1)

5-5: LGTM - Version bump to support JSPI features.

The Emscripten SDK update to 4.0.20 aligns with the JSPI (JavaScript Promise Integration) flags in the docker-compose.yml build configuration (-sJSPI -sJSPI_EXPORTS=['wllama_start','wllama_action']). The Docker image tags exist on Docker Hub, including the ARM64 variant properly appended at line 16.

examples/main/src/utils/types.ts (1)

16-20: LGTM!

The usingWebGPU field addition to RuntimeInfo is consistent with the existing interface pattern and properly integrates with the WebGPU runtime detection in wllama.context.tsx.

cpp/wllama.cpp (1)

121-122: LGTM!

The new perf_context and perf_reset action handlers follow the established WLLAMA_ACTION macro pattern consistently with other actions in this file.

examples/main/src/components/ModelScreen.tsx (1)

332-336: LGTM!

The WebGPU status indicator follows the established pattern for displaying runtime information alongside the existing Multithread and Chat template indicators.

cpp/test_glue.cpp (1)

15-43: LGTM!

The test correctly validates the serialization round-trip for the new use_webgpu field, ensuring the value is properly preserved through serialize/deserialize operations.

examples/main/src/utils/wllama.context.tsx (2)

61-71: LGTM!

The WebGPU preference is correctly configured in both the initial instance creation and resetWllamaInstance, ensuring consistent behavior. The actual WebGPU usage state is properly retrieved at runtime via usingWebGPU() method.

176-180: LGTM!

The runtime info correctly captures the actual WebGPU usage state (not just the preference) by calling wllamaInstance.usingWebGPU() after model loading completes.

examples/main/src/components/ChatScreen.tsx (2)

77-80: LGTM!

The formatTokPerSec utility correctly handles the edge case of ms <= 0 to prevent division by zero, and the calculation is accurate.

197-221: LGTM!

The performance metrics UI is well-structured with proper conditional rendering for error states, appropriate button disabling during busy/generating states, and clean display of prefill/decode metrics.

cpp/actions.hpp (3)

800-819: LGTM!

The action_perf_context implementation correctly handles the null context case and properly maps all performance counter fields from llama_perf_context_data to the response structure.

821-833: LGTM!

The action_perf_reset implementation is clean with proper null context handling.

164-177: No issue: devices array is safely used and copied by llama.cpp.

The stack-allocated devices array is passed immediately to llama_model_load_from_splits(), which copies the device list into the model's internal storage during the call. The original array does not need to outlive the function call, and no dangling pointer issue exists here.

cpp/glue.hpp (2)

497-499: LGTM!

The new use_webgpu and no_perf fields are correctly positioned and typed, consistent with the TypeScript definitions in src/glue/messages.ts.

824-851: LGTM!

The new performance context message types are correctly defined with handler names matching the TypeScript definitions and field structures aligning with PerfContextData interface.

src/glue/messages.ts (2)

1-6: Note: This is a generated file.

The file header indicates it's generated by cpp/generate_glue_prototype.js. Ensure changes to message definitions are made through the generator script rather than manual edits to maintain consistency between C++ and TypeScript.

1378-1405: LGTM!

The new performance context interfaces are correctly defined and consistent with the C++ message structures in cpp/glue.hpp.

src/wllama.ts (3)

619-626: LGTM!

The logic correctly disables multi-threading when WebGPU is active, with an informative warning message explaining the reason.

662-664: LGTM!

The n_gpu_layers setting of 999 for WebGPU (to offload all layers) and 0 for CPU is the correct approach for backend selection.

464-467: LGTM!

The usingWebGPU() accessor follows the same pattern as other state accessors like isMultithread(), with proper model-loaded check.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

.github/workflows/deploy-examples-main.yml

coderabbitai · 2026-01-15T16:46:28Z

CMakeLists.txt

+option(GGML_WEBGPU "Enable GGML WebGPU backend" ON)
+option(GGML_WEBGPU_JSPI "Enable GGML WebGPU JSPI support" ON)
+option(LLAMA_WASM_MEM64 "Enable 64-bit memory for WebAssembly builds" OFF)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

JSPI JavaScript Promise Integration browser support 2025 2026

💡 Result:

Summary — JSPI (WebAssembly JavaScript Promise Integration) browser support (2025–Jan 15, 2026)

Status: Standardization underway (W3C/WebAssembly WG; JSPI has advanced through the spec process and is implemented experimentally). (v8.dev)

Chrome / Chromium: Implemented experimentally in V8. Available via feature flag / origin trial during 2024–2025 and shipped behind flags in Chrome stable/channel builds (noted availability in Chrome 126+ for the new API and coverage in Chrome 137 articles). Feature-detection: 'Suspender' in WebAssembly (or newer API checks as described in V8 docs). (developer.chrome.com)

Edge: Chromium-based Edge tracks Chromium’s implementation; JSPI mentioned in Edge platform release notes (support entries in Edge 137 notes). Origin-trial participation was offered. (learn.microsoft.com)

Firefox: V8 docs and V8 blog posts note availability/experimental builds in Firefox (references to Firefox 139 in V8 materials); support status in Firefox may be experimental behind flags/versions — check specific Firefox nightly/dev release notes for exact builds. (v8.dev)

Safari / WebKit: No broad shipping announcement found (as of Jan 15, 2026). JSPI has been implemented first in Chromium/V8; WebKit status appears not to have shipped stable support — verify WebKit bug tracker / Safari release notes for any updates before relying on it in production. (v8.dev)

Practical notes:

In 2024–2025 JSPI was primarily available via origin trials or feature flags in Chromium-based browsers; by mid–late 2025 some Chromium channels documented shipping behavior (see Chrome/Edge notes). Confirm current availability by testing feature-detection in the target browser and checking vendor release notes for the exact version you need. (developer.chrome.com)

If you want, I can:

Check the exact support matrix for specific browser versions you care about (Chrome/Edge/Firefox/Safari) and list commands/flags to enable JSPI locally.

Citations:

1: https://v8.dev/blog/jspi?utm_source=openai

2: https://developer.chrome.com/blog/webassembly-jspi-origin-trial?utm_source=openai

3: https://learn.microsoft.com/en-us/microsoft-edge/web-platform/release-notes/137?utm_source=openai

4: https://v8.dev/blog/jspi?utm_source=openai

5: https://v8.dev/blog/jspi?utm_source=openai

6: https://developer.chrome.com/blog/webassembly-jspi-origin-trial?utm_source=openai

Options are well-defined; reconsider GGML_WEBGPU_JSPI default due to browser compatibility.

The CMake options are correctly placed before add_subdirectory(llama.cpp). However, GGML_WEBGPU_JSPI defaults to ON, and JSPI browser support remains limited as of January 2026: Chrome/Edge require experimental flags, Firefox 139 has experimental support, and Safari/WebKit have not shipped JSPI support. This creates a significant compatibility gap for Safari users. Consider defaulting to OFF for broader compatibility, or explicitly document JSPI browser requirements and the experimental nature of this feature.

🤖 Prompt for AI Agents

In `@CMakeLists.txt` around lines 4 - 6, The GGML_WEBGPU_JSPI option currently defaults to ON which breaks compatibility for many browsers; change the CMake option declaration for GGML_WEBGPU_JSPI to default to OFF (i.e., set its default value to OFF) and update the surrounding CMake comment or project documentation to explicitly state JSPI is experimental and list the browser requirements/flags (e.g., Chrome/Edge experimental flags, Firefox 139 experimental support, Safari/WebKit not supported) so consumers know how to enable it intentionally.

it's a good point, but there are still some unresolved issues with using ASYNCIFY. We could try to handle it, but it also seems like WebKit is planning on supporting JSPI at some point (WebKit/standards-positions#422)

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Fetch the WebKit standards position on JSPI to understand the status gh api repos/WebKit/standards-positions/issues/422 --jq '{state: .state, title: .title, created: .created_at, updated: .updated_at, body: .body, comments: .comments}' 2>/dev/null || echo "Failed to fetch issue details" # Get recent comments on the issue to understand current status gh api repos/WebKit/standards-positions/issues/422/comments --jq '.[-3:] | .[] | {created: .created_at, author: .user.login, body: .body}' 2>/dev/null || echo "Failed to fetch comments"

Length of output: 4287

@reeselevine, that's a fair point! The tradeoff between ASYNCIFY issues and JSPI browser support is a reasonable engineering decision. I can see the WebKit standards position issue shows positive progress—kmiller68 from WebKit commented in September 2025 that "we've withdrawn our objections to this API," which is a good signal for future support.

Given that ASYNCIFY has unresolved issues and JSPI enables better functionality for the majority of browsers that do support it (Chrome, Edge, Firefox experimental), defaulting to ON makes sense for your use case.

One suggestion: consider adding a note in the README or build documentation about JSPI browser requirements so users on Safari/WebKit-based browsers understand why WebGPU might not work for them yet, and can disable the option if needed. Something like:

## WebGPU Backend - Requires JSPI (JavaScript Promise Integration) support - Supported: Chrome/Edge (with flags), Firefox 139+ (experimental) - Not yet supported: Safari/WebKit (under consideration) - To disable: `cmake -DGGML_WEBGPU_JSPI=OFF`

But that's just a nice-to-have for user experience—the default is justified given the technical constraints.

🧠 Learnings used

Learnt from: reeselevine Repo: ngxson/wllama PR: 198 File: src/workers-code/llama-cpp.js:269-279 Timestamp: 2025-12-29T03:44:02.649Z Learning: In wllama's src/workers-code/llama-cpp.js, the isAsync parameter in callWrapper should only be true for functions included in JSPI_EXPORTS build configuration (wllama_start and wllama_action). Other functions like wllama_exit, wllama_debug, and wllama_malloc should use isAsync=false even though the callWrapper returns an async function, because they don't perform async operations at the C/WASM level.

reeselevine · 2026-01-16T21:53:48Z

Ok, I guess if we want to support Safari, we will need to build separate wasm blobs without JSPI/WebGPU for contexts that don't support it. Unfortunately, it looks like the wasm built with JSPI isn't compatible with Safari, even if the WebGPU backend is not used.

I can look into supporting ASYNCIFY a bit and if I get it working compare the performance. Potentially there's a path where only ASYNCIFY is used until JSPI is more widely supported. Or wllama will have to ship more wasm binaries, ones with JSPI and ones without.

ngxson · 2026-01-16T22:00:46Z

Yeah I think we can have the third build named webgpu, that is basically webgpu + single-thread + asyncify.

Anyways, I'm still not sure why CI doesn't run automatically on this PR 🤔 I just need the npm run test to pass as it should already cover chrome + firefox.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/wllama.ts`:
- Around line 601-621: The code can set this.nbThreads to 0 when
navigator.hardwareConcurrency is 1; change the hwConcurrency and nbThreads
calculation to enforce a minimum of 1: compute hwConcurrency = Math.max(1,
Math.floor((navigator.hardwareConcurrency || 1) / 2)) and then set
this.nbThreads = Math.max(1, config.n_threads ?? hwConcurrency); keep the
existing check that sets this.useMultiThread = true only when this.nbThreads > 1
and retain the warning branch when it falls back to single-thread; update
references to this.nbThreads (used later as n_threads) accordingly.
- Around line 134-143: getPerfContext() currently returns the raw
GlueMsgPerfContextRes which includes the internal _name: "pctx_res" field and
leaks it through the public PerfContextData API; modify getPerfContext() to
transform/map the GlueMsgPerfContextRes into a PerfContextData object by copying
only the public fields (success, t_start_ms, t_load_ms, t_p_eval_ms, t_eval_ms,
n_p_eval, n_eval, n_reused) and omitting _name before returning so callers
receive the clean PerfContextData shape.

🧹 Nitpick comments (1)

src/wllama.ts (1)

587-599: Consider adding try-catch around WebGPU adapter request.

While requestAdapter() typically returns null on failure, wrapping it in try-catch would guard against edge cases where the call throws (browser bugs, permission errors, etc.).

Suggested defensive approach

    if (this.config.preferWebGPU) {
      if (navigator.gpu) {
-       if(await navigator.gpu.requestAdapter()) {
-           this.useWebGPU = true;
-       } else {
-         this.logger().warn('WebGPU backend requested but no adapter found, falling back to CPU');
+       try {
+         if (await navigator.gpu.requestAdapter()) {
+           this.useWebGPU = true;
+         } else {
+           this.logger().warn('WebGPU backend requested but no adapter found, falling back to CPU');
+         }
+       } catch (e) {
+         this.logger().warn('WebGPU adapter request failed, falling back to CPU', e);
        }
      } else {
        this.logger().warn(
          'WebGPU backend requested but WebGPU is not available, falling back to CPU'
        );
      }
    }

src/wllama.ts

coderabbitai · 2026-01-16T22:06:15Z

src/wllama.ts

+    if (await isSupportMultiThread()) {
+      if (this.pathConfig['multi-thread/wllama.wasm']) {
+        const hwConcurrency = Math.floor((navigator.hardwareConcurrency || 1) / 2);
+        this.nbThreads = config.n_threads ?? hwConcurrency;
+        if (this.nbThreads > 1) {
+          this.useMultiThread = true;
+        } else {
+          this.logger().warn(
+            'Falling back single-thread due to n_threads configuration or limited hardware concurrency'
+          );
+        }
+      } else {
+        this.logger().warn(
+          'Missing paths to "multi-thread/wllama.wasm", falling back to single-thread'
+        );
+      }
+    } else {
      this.logger().warn(
        'Multi-threads are not supported in this environment, falling back to single-thread'
      );
    }


⚠️ Potential issue | 🟡 Minor

Edge case: nbThreads may become 0 when hardware concurrency is limited.

When navigator.hardwareConcurrency is 1, hwConcurrency becomes Math.floor(1/2) = 0. If config.n_threads is not specified, nbThreads will be set to 0, which could cause unexpected behavior when passed to the worker (line 671: n_threads: this.nbThreads).

Proposed fix: ensure minimum of 1 thread

- const hwConcurrency = Math.floor((navigator.hardwareConcurrency || 1) / 2); - this.nbThreads = config.n_threads ?? hwConcurrency; + const hwConcurrency = Math.max(1, Math.floor((navigator.hardwareConcurrency || 1) / 2)); + this.nbThreads = config.n_threads ?? hwConcurrency;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (await isSupportMultiThread()) {

if (this.pathConfig['multi-thread/wllama.wasm']) {

const hwConcurrency = Math.floor((navigator.hardwareConcurrency || 1) / 2);

this.nbThreads = config.n_threads ?? hwConcurrency;

if (this.nbThreads > 1) {

this.useMultiThread = true;

} else {

this.logger().warn(

'Falling back single-thread due to n_threads configuration or limited hardware concurrency'

);

}

} else {

this.logger().warn(

'Missing paths to "multi-thread/wllama.wasm", falling back to single-thread'

);

}

} else {

this.logger().warn(

'Multi-threads are not supported in this environment, falling back to single-thread'

);

}

if (await isSupportMultiThread()) {

if (this.pathConfig['multi-thread/wllama.wasm']) {

const hwConcurrency = Math.max(1, Math.floor((navigator.hardwareConcurrency || 1) / 2));

this.nbThreads = config.n_threads ?? hwConcurrency;

if (this.nbThreads > 1) {

this.useMultiThread = true;

} else {

this.logger().warn(

'Falling back single-thread due to n_threads configuration or limited hardware concurrency'

);

}

} else {

this.logger().warn(

'Missing paths to "multi-thread/wllama.wasm", falling back to single-thread'

);

}

} else {

this.logger().warn(

'Multi-threads are not supported in this environment, falling back to single-thread'

);

}

🤖 Prompt for AI Agents

In `@src/wllama.ts` around lines 601 - 621, The code can set this.nbThreads to 0 when navigator.hardwareConcurrency is 1; change the hwConcurrency and nbThreads calculation to enforce a minimum of 1: compute hwConcurrency = Math.max(1, Math.floor((navigator.hardwareConcurrency || 1) / 2)) and then set this.nbThreads = Math.max(1, config.n_threads ?? hwConcurrency); keep the existing check that sets this.useMultiThread = true only when this.nbThreads > 1 and retain the warning branch when it falls back to single-thread; update references to this.nbThreads (used later as n_threads) accordingly.

ngxson · 2026-01-16T22:17:47Z

@coderabbitai pause

coderabbitai · 2026-01-16T22:17:53Z

✅ Actions performed

Reviews paused.

reeselevine · 2026-01-20T07:05:36Z

Not sure how to open a PR targeting this branch in this repository, but I've added ASYNCIFY support and some discussion points here: reeselevine#1

reeselevine · 2026-02-04T18:17:18Z

@ngxson I know you're probably busy but would be interested in getting some of this integrated soon.

ngxson · 2026-02-25T14:53:25Z

@reeselevine sorry for the delay, I'm going back to wllama this week.

For now, npm run test fails so I still cannot merge this yet, but will have a look now.

boingboomtschak and others added 20 commits December 22, 2025 10:16

Changes to get llama.cpp WebGPU backend working in build process

677940f

Removing __pycache__

ed56306

Bumping emsdk image ver to 4.0.20 from 4.0.10, for newer included emd…

7d141ec

…awnwebgpu port, removing remote port file from repo

Adding WebGPU backend build to both single and multi threaded paths

ba4905a

Explicitly exporting HEAPU8 (required by newer Emscripten ver)

133cff4

Pushing 64-bit flag change, submodule update, some JSPI/Asyncify inve…

2f1f500

…stigation

Separate path for webgpu build

7cf42a8

working

134e50e

Move to passing backend choice to wasm

b3dbe23

Cleanup

1323ec1

Update flags

d718207

deploy

bb64e35

Add webgpu types to main example

690da82

Add performance tracking

1f9b342

Add webgpu support query to wllama api, address some issues

be3305e

Merge remote-tracking branch 'origin/perf'

fd9dd59

Update wasm blob with max(4gb, supported size)

f5a4058

Add back n_gpu_layers, update builds

492c423

formatting

d9112bd

Update llama.cpp submodule and rebuild wasm blobs

a8a9546

reeselevine mentioned this pull request Jan 15, 2026

WebGPU integration #198

Closed

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

Minor feedback addressed

23bcdb7

ngxson mentioned this pull request Jan 16, 2026

feat: fix build with latest llama.cpp, security-upgrade deps, improve DX #200

Open

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

Function	isAsync	Correct
`wllama_malloc`	`false`	✓
`wllama_start`	`true`	✓
`wllama_action`	`true`	✓
`wllama_exit`	`false`	✓
`wllama_debug`	`false`	✓

Conversation

reeselevine commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reeselevine Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine commented Jan 16, 2026

Uh oh!

ngxson commented Jan 16, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

reeselevine commented Jan 20, 2026

Uh oh!

reeselevine commented Feb 4, 2026

Uh oh!

ngxson commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

reeselevine commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

coderabbitai bot Jan 15, 2026 •

edited

Loading