Conversation
…awnwebgpu port, removing remote port file from repo
|
Note Reviews pausedUse the following commands to manage reviews:
📝 WalkthroughWalkthroughAdds WebGPU selectable backend support and perf-context APIs, bumps GLUE protocol to v2, updates WASM/build tooling and TypeScript types, exposes WebGPU and perf status in the frontend, and wires new perf actions through the native worker bridge. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Browser Client
participant Wllama as Wllama (TS)
participant Worker as Worker/WASM
participant Backend as GGML Backend
Client->>Wllama: loadModel(preferWebGPU: true)
Wllama->>Wllama: Check for WebGPU adapter
alt adapter found
Wllama->>Wllama: set useWebGPU = true
else no adapter
Wllama->>Wllama: set useWebGPU = false (warn)
end
Wllama->>Worker: wllama_start(loadReq with use_webgpu, no_perf, n_gpu_layers)
Worker->>Backend: initialize (select WebGPU or CPU)
Backend-->>Worker: initialized
Worker-->>Wllama: ready
Wllama-->>Client: model loaded (usingWebGPU status)
Client->>Wllama: getPerfContext()
Wllama->>Worker: wllama_action(pctx_req)
Worker->>Backend: collect perf metrics
Backend-->>Worker: perf data
Worker-->>Wllama: pctx_res (metrics)
Wllama-->>Client: PerfContextData
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/glue.hpp (2)
449-452: Missingbreakstatement causes unintended fallthrough.The
GLUE_DTYPE_RAWcase is missing abreakstatement, causing execution to fall through toGLUE_DTYPE_ARRAY_BOOL. This will corrupt deserialization of raw fields.🐛 Fix missing break statement
case GLUE_DTYPE_RAW: ((glue_raw *)field)->parse(input); + break; case GLUE_DTYPE_ARRAY_BOOL:
464-466: Missingbreakstatement at end of switch.The
GLUE_DTYPE_ARRAY_RAWcase is missing abreakstatement. While this is the last case and won't cause functional issues currently, it's inconsistent with other cases and could cause problems if more cases are added later.🐛 Fix missing break statement
case GLUE_DTYPE_ARRAY_RAW: ((glue_arr_raw *)field)->parse(input); + break; }
🤖 Fix all issues with AI agents
In @.github/workflows/deploy-examples-main.yml:
- Around line 1-36: Replace the invalid pinned SHA for the GitHub Pages deploy
action in the deploy step (the uses: JamesIves/github-pages-deploy-action@...)
with a valid tag or release (e.g., JamesIves/github-pages-deploy-action@v4.6.1)
to ensure the action can be resolved; also add a top-level concurrency block
(e.g., concurrency: { group: 'deploy-pages', cancel-in-progress: true }) to the
workflow to prevent overlapping runs when the workflow is triggered multiple
times.
In `@CMakeLists.txt`:
- Around line 4-6: The GGML_WEBGPU_JSPI option currently defaults to ON which
breaks compatibility for many browsers; change the CMake option declaration for
GGML_WEBGPU_JSPI to default to OFF (i.e., set its default value to OFF) and
update the surrounding CMake comment or project documentation to explicitly
state JSPI is experimental and list the browser requirements/flags (e.g.,
Chrome/Edge experimental flags, Firefox 139 experimental support, Safari/WebKit
not supported) so consumers know how to enable it intentionally.
🧹 Nitpick comments (4)
scripts/docker-compose.yml (1)
22-22: Comprehensive EMCC flags for JSPI and WebGPU support.The consolidated flags correctly enable JSPI (JavaScript Promise Integration), SIMD, modern exception handling, and expose necessary runtime methods including
HEAPU8for direct memory access. The JSPI_EXPORTS configuration (wllama_start,wllama_action) correctly targets the async-capable functions.Note: JSPI is experimental/provisional in Emscripten 4.0.20. Runtime behavior depends on browser support—Chrome/V8 ships it behind origin trials, and some versions require explicit feature flags to enable. Ensure your deployment targets compatible browser versions.
The single-line flag definition is difficult to read and maintain. Consider breaking it into a multi-line format for clarity:
♻️ Suggested readability improvement
- export SHARED_EMCC_FLAGS="--no-entry -O3 -msimd128 -DNDEBUG -flto=full -frtti -fwasm-exceptions -sEXPORT_ALL=1 -sEXPORT_ES6=0 -sMODULARIZE=0 -sINITIAL_MEMORY=128MB -sMAXIMUM_MEMORY=4096MB -sALLOW_MEMORY_GROWTH=1 -sFORCE_FILESYSTEM=1 -sEXPORTED_FUNCTIONS=_main,_wllama_malloc,_wllama_start,_wllama_action,_wllama_exit,_wllama_debug -sEXPORTED_RUNTIME_METHODS=ccall,cwrap,HEAPU8 -sNO_EXIT_RUNTIME=1 -Wno-unused-command-line-argument -sJSPI -sJSPI_EXPORTS=['wllama_start','wllama_action']" + export SHARED_EMCC_FLAGS="\ + --no-entry -O3 -msimd128 -DNDEBUG -flto=full -frtti -fwasm-exceptions \ + -sEXPORT_ALL=1 -sEXPORT_ES6=0 -sMODULARIZE=0 \ + -sINITIAL_MEMORY=128MB -sMAXIMUM_MEMORY=4096MB -sALLOW_MEMORY_GROWTH=1 \ + -sFORCE_FILESYSTEM=1 \ + -sEXPORTED_FUNCTIONS=_main,_wllama_malloc,_wllama_start,_wllama_action,_wllama_exit,_wllama_debug \ + -sEXPORTED_RUNTIME_METHODS=ccall,cwrap,HEAPU8 \ + -sNO_EXIT_RUNTIME=1 -Wno-unused-command-line-argument \ + -sJSPI -sJSPI_EXPORTS=['wllama_start','wllama_action']"examples/main/src/components/ChatScreen.tsx (1)
51-61: Consider guardingrefreshPerfwithloadedModelcheck.The
resetPerffunction guards against!loadedModel, butrefreshPerfdoes not. IfrefreshPerfis called when no model is loaded (e.g., during edge-case race conditions),getWllamaInstance().getPerfContext()might throw an error that's not specific to the missing model state.♻️ Suggested improvement
const refreshPerf = async () => { + if (!loadedModel) return; setPerfBusy(true); setPerfError(null); try { setPerfData(await getWllamaInstance().getPerfContext()); } catch (e) { setPerfError((e as any)?.message ?? 'Failed to fetch perf data'); } finally { setPerfBusy(false); } };src/wllama.ts (2)
1382-1393: Type inconsistency between return type and actual value.
getPerfContext()is declared to returnPerfContextData, but it actually returnsGlueMsgPerfContextReswhich includes the_name: "pctx_res"field. While this works at runtime (extra fields are allowed), it's a type inconsistency.Looking at the
PerfContextDatainterface (lines 134-143), it matchesGlueMsgPerfContextResexcept for_name. Consider either:
- Stripping
_namefrom the return value, or- Extending
PerfContextDatafromGlueMsgPerfContextRes♻️ Option 1: Strip _name from return
async getPerfContext(): Promise<PerfContextData> { this.checkModelLoaded(); - return await this.proxy.wllamaAction<GlueMsgPerfContextRes>( + const result = await this.proxy.wllamaAction<GlueMsgPerfContextRes>( 'perf_context', { _name: 'pctx_req', } ); + const { _name, ...perfData } = result; + return perfData; }
587-595: Use async adapter check for robust WebGPU detection.The current check
navigator.gpuonly verifies the WebGPU API exists, not whether a compatible adapter is available. According to WebGPU best practices,await navigator.gpu?.requestAdapter()should be called to confirm a usable adapter exists;requestAdapter()resolves tonullwhen the browser cannot provide a suitable adapter.Since
loadModelis alreadyasync, this check can be awaited without architectural changes.♻️ More robust WebGPU detection
- if (this.config.preferWebGPU) { - if (navigator.gpu) { - this.useWebGPU = true; - } else { - this.logger().warn( - 'WebGPU backend requested but WebGPU is not available, falling back to CPU' - ); - } - } + if (this.config.preferWebGPU) { + try { + const adapter = await navigator.gpu?.requestAdapter(); + if (adapter) { + this.useWebGPU = true; + } else { + this.logger().warn( + 'WebGPU backend requested but no compatible adapter found, falling back to CPU' + ); + } + } catch { + this.logger().warn( + 'WebGPU backend requested but WebGPU is not available, falling back to CPU' + ); + } + }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
examples/main/package-lock.jsonis excluded by!**/package-lock.jsonpackage-lock.jsonis excluded by!**/package-lock.jsonsrc/multi-thread/wllama.wasmis excluded by!**/*.wasmsrc/single-thread/wllama.wasmis excluded by!**/*.wasmsrc/webgpu-single-thread/wllama.wasmis excluded by!**/*.wasm
📒 Files selected for processing (24)
.github/workflows/deploy-examples-main.ymlCMakeLists.txtcpp/actions.hppcpp/glue.hppcpp/test_glue.cppcpp/wllama.cppexamples/main/package.jsonexamples/main/src/components/ChatScreen.tsxexamples/main/src/components/ModelScreen.tsxexamples/main/src/utils/types.tsexamples/main/src/utils/wllama.context.tsxexamples/main/tsconfig.app.jsonllama.cpppackage.jsonscripts/build_wasm.shscripts/docker-compose.ymlsrc/glue/messages.tssrc/multi-thread/wllama.jssrc/single-thread/wllama.jssrc/webgpu-single-thread/wllama.jssrc/wllama.tssrc/workers-code/generated.tssrc/workers-code/llama-cpp.jstsconfig.build.json
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-04-18T08:24:14.434Z
Learnt from: ngxson
Repo: ngxson/wllama PR: 0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.
Applied to files:
scripts/build_wasm.shsrc/wllama.tspackage.json
📚 Learning: 2025-12-29T03:44:02.649Z
Learnt from: reeselevine
Repo: ngxson/wllama PR: 198
File: src/workers-code/llama-cpp.js:269-279
Timestamp: 2025-12-29T03:44:02.649Z
Learning: In wllama's src/workers-code/llama-cpp.js, the isAsync parameter in callWrapper should only be true for functions included in JSPI_EXPORTS build configuration (wllama_start and wllama_action). Other functions like wllama_exit, wllama_debug, and wllama_malloc should use isAsync=false even though the callWrapper returns an async function, because they don't perform async operations at the C/WASM level.
Applied to files:
cpp/wllama.cppsrc/wllama.ts
📚 Learning: 2025-12-29T03:44:00.085Z
Learnt from: reeselevine
Repo: ngxson/wllama PR: 198
File: src/workers-code/llama-cpp.js:269-279
Timestamp: 2025-12-29T03:44:00.085Z
Learning: In src/workers-code/llama-cpp.js, constrain the isAsync flag passed to callWrapper: set isAsync to true only for functions included in the JSPI_EXPORTS build (specifically wllama_start and wllama_action). For other exported functions such as wllama_exit, wllama_debug, and wllama_malloc, always use isAsync=false since they do not perform async operations at the C/WASM level, even though callWrapper may expose an async function. This should be verifiable by reviewing the build exports and ensuring these functions are not marked as asynchronous in the wrapper when invoked.
Applied to files:
src/workers-code/llama-cpp.js
🧬 Code graph analysis (5)
examples/main/src/components/ChatScreen.tsx (1)
src/wllama.ts (1)
PerfContextData(134-143)
examples/main/src/utils/wllama.context.tsx (2)
src/wllama.ts (1)
Wllama(298-1473)examples/main/src/config.ts (1)
WLLAMA_CONFIG_PATHS(9-12)
src/wllama.ts (1)
src/glue/messages.ts (2)
GlueMsgPerfContextRes(1384-1394)GlueMsgPerfResetRes(1402-1405)
cpp/glue.hpp (1)
src/glue/messages.ts (1)
GLUE_VERSION(6-6)
src/workers-code/llama-cpp.js (2)
cpp/generate_glue_prototype.js (1)
name(39-39)src/worker.ts (4)
wllamaStart(119-127)wllamaAction(129-141)wllamaExit(143-153)wllamaDebug(155-162)
🔇 Additional comments (29)
src/workers-code/llama-cpp.js (2)
230-247: LGTM!The
callWrapperfunction correctly handles theisAsyncflag:
- Passes
{ async: true }tocwraponly when needed for JSPI support- The conditional await inside the wrapper ensures synchronous functions execute without unnecessary promise wrapping at the native level
- Error handling is appropriate
269-283: LGTM! TheisAsyncflags are correctly applied.Based on learnings,
isAsync=trueshould only be set for functions included in the JSPI_EXPORTS build (wllama_startandwllama_action). The configuration here correctly reflects this:
Function isAsync Correct wllama_mallocfalse✓ wllama_starttrue✓ wllama_actiontrue✓ wllama_exitfalse✓ wllama_debugfalse✓ examples/main/package.json (2)
30-30: Appropriate addition for WebGPU type support.Adding
@webgpu/typesas a devDependency is correct for providing TypeScript type definitions for WebGPU APIs. This aligns with the tsconfig changes that include this in thetypesarray.
16-16: No action required. The version bump from^0.2.2to^0.5.3has no documented breaking changes. The v0.5.x release adds formatting support for chat templates (additive changes), and the example app already has a workaround in place for DeepSeek template compatibility issues.package.json (1)
51-51: LGTM!Adding
@webgpu/typesat the root level is necessary for the library's TypeScript compilation. The version is consistent with the example app's devDependency.llama.cpp (1)
1-1: Verify submodule commit and review upstream changes before merging.Ensure the new commit exists in the upstream llama.cpp repository and review changes between
4abef75f2cf2eee75eb5083b30a94cf981587394and9e41884dce4719d8d901058613ea7a700dc3697dfor breaking API changes, security implications, or new dependencies that might affect the integration with this project's WebGPU implementation.examples/main/tsconfig.app.json (1)
8-8: Note:typesarray restricts automatic type inclusion.When the
typesarray is specified, only the listed packages are included. If other@types/*packages are needed (e.g.,@types/node), they must be explicitly added here. If this restriction is intentional for strict type control, this is fine.scripts/docker-compose.yml (2)
27-28: LGTM - Single-thread build configuration.Correctly derives flags from the shared base.
36-37: LGTM - Multi-thread build with pthread configuration.The pthread flags are correctly appended, and
PTHREAD_POOL_SIZE=Module["pthreadPoolSize"]allows runtime configuration of the thread pool size.tsconfig.build.json (1)
30-31: LGTM - WebGPU types and synthetic imports enabled.Both additions are appropriate:
allowSyntheticDefaultImportsenables cleaner import syntax for CommonJS interoptypes: ["@webgpu/types"]provides WebGPU type definitions for the library build
@webgpu/typesis properly listed in devDependencies.scripts/build_wasm.sh (1)
5-5: LGTM - Version bump to support JSPI features.The Emscripten SDK update to 4.0.20 aligns with the JSPI (JavaScript Promise Integration) flags in the docker-compose.yml build configuration (
-sJSPI -sJSPI_EXPORTS=['wllama_start','wllama_action']). The Docker image tags exist on Docker Hub, including the ARM64 variant properly appended at line 16.examples/main/src/utils/types.ts (1)
16-20: LGTM!The
usingWebGPUfield addition toRuntimeInfois consistent with the existing interface pattern and properly integrates with the WebGPU runtime detection inwllama.context.tsx.cpp/wllama.cpp (1)
121-122: LGTM!The new
perf_contextandperf_resetaction handlers follow the establishedWLLAMA_ACTIONmacro pattern consistently with other actions in this file.examples/main/src/components/ModelScreen.tsx (1)
332-336: LGTM!The WebGPU status indicator follows the established pattern for displaying runtime information alongside the existing Multithread and Chat template indicators.
cpp/test_glue.cpp (1)
15-43: LGTM!The test correctly validates the serialization round-trip for the new
use_webgpufield, ensuring the value is properly preserved through serialize/deserialize operations.examples/main/src/utils/wllama.context.tsx (2)
61-71: LGTM!The WebGPU preference is correctly configured in both the initial instance creation and
resetWllamaInstance, ensuring consistent behavior. The actual WebGPU usage state is properly retrieved at runtime viausingWebGPU()method.
176-180: LGTM!The runtime info correctly captures the actual WebGPU usage state (not just the preference) by calling
wllamaInstance.usingWebGPU()after model loading completes.examples/main/src/components/ChatScreen.tsx (2)
77-80: LGTM!The
formatTokPerSecutility correctly handles the edge case ofms <= 0to prevent division by zero, and the calculation is accurate.
197-221: LGTM!The performance metrics UI is well-structured with proper conditional rendering for error states, appropriate button disabling during busy/generating states, and clean display of prefill/decode metrics.
cpp/actions.hpp (3)
800-819: LGTM!The
action_perf_contextimplementation correctly handles the null context case and properly maps all performance counter fields fromllama_perf_context_datato the response structure.
821-833: LGTM!The
action_perf_resetimplementation is clean with proper null context handling.
164-177: No issue: devices array is safely used and copied by llama.cpp.The stack-allocated
devicesarray is passed immediately tollama_model_load_from_splits(), which copies the device list into the model's internal storage during the call. The original array does not need to outlive the function call, and no dangling pointer issue exists here.cpp/glue.hpp (2)
497-499: LGTM!The new
use_webgpuandno_perffields are correctly positioned and typed, consistent with the TypeScript definitions insrc/glue/messages.ts.
824-851: LGTM!The new performance context message types are correctly defined with handler names matching the TypeScript definitions and field structures aligning with
PerfContextDatainterface.src/glue/messages.ts (2)
1-6: Note: This is a generated file.The file header indicates it's generated by
cpp/generate_glue_prototype.js. Ensure changes to message definitions are made through the generator script rather than manual edits to maintain consistency between C++ and TypeScript.
1378-1405: LGTM!The new performance context interfaces are correctly defined and consistent with the C++ message structures in
cpp/glue.hpp.src/wllama.ts (3)
619-626: LGTM!The logic correctly disables multi-threading when WebGPU is active, with an informative warning message explaining the reason.
662-664: LGTM!The
n_gpu_layerssetting of 999 for WebGPU (to offload all layers) and 0 for CPU is the correct approach for backend selection.
464-467: LGTM!The
usingWebGPU()accessor follows the same pattern as other state accessors likeisMultithread(), with proper model-loaded check.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| option(GGML_WEBGPU "Enable GGML WebGPU backend" ON) | ||
| option(GGML_WEBGPU_JSPI "Enable GGML WebGPU JSPI support" ON) | ||
| option(LLAMA_WASM_MEM64 "Enable 64-bit memory for WebAssembly builds" OFF) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
JSPI JavaScript Promise Integration browser support 2025 2026
💡 Result:
Summary — JSPI (WebAssembly JavaScript Promise Integration) browser support (2025–Jan 15, 2026)
- Status: Standardization underway (W3C/WebAssembly WG; JSPI has advanced through the spec process and is implemented experimentally). (v8.dev)
- Chrome / Chromium: Implemented experimentally in V8. Available via feature flag / origin trial during 2024–2025 and shipped behind flags in Chrome stable/channel builds (noted availability in Chrome 126+ for the new API and coverage in Chrome 137 articles). Feature-detection: 'Suspender' in WebAssembly (or newer API checks as described in V8 docs). (developer.chrome.com)
- Edge: Chromium-based Edge tracks Chromium’s implementation; JSPI mentioned in Edge platform release notes (support entries in Edge 137 notes). Origin-trial participation was offered. (learn.microsoft.com)
- Firefox: V8 docs and V8 blog posts note availability/experimental builds in Firefox (references to Firefox 139 in V8 materials); support status in Firefox may be experimental behind flags/versions — check specific Firefox nightly/dev release notes for exact builds. (v8.dev)
- Safari / WebKit: No broad shipping announcement found (as of Jan 15, 2026). JSPI has been implemented first in Chromium/V8; WebKit status appears not to have shipped stable support — verify WebKit bug tracker / Safari release notes for any updates before relying on it in production. (v8.dev)
Practical notes:
- In 2024–2025 JSPI was primarily available via origin trials or feature flags in Chromium-based browsers; by mid–late 2025 some Chromium channels documented shipping behavior (see Chrome/Edge notes). Confirm current availability by testing feature-detection in the target browser and checking vendor release notes for the exact version you need. (developer.chrome.com)
If you want, I can:
- Check the exact support matrix for specific browser versions you care about (Chrome/Edge/Firefox/Safari) and list commands/flags to enable JSPI locally.
Citations:
- 1: https://v8.dev/blog/jspi?utm_source=openai
- 2: https://developer.chrome.com/blog/webassembly-jspi-origin-trial?utm_source=openai
- 3: https://learn.microsoft.com/en-us/microsoft-edge/web-platform/release-notes/137?utm_source=openai
- 4: https://v8.dev/blog/jspi?utm_source=openai
- 5: https://v8.dev/blog/jspi?utm_source=openai
- 6: https://developer.chrome.com/blog/webassembly-jspi-origin-trial?utm_source=openai
Options are well-defined; reconsider GGML_WEBGPU_JSPI default due to browser compatibility.
The CMake options are correctly placed before add_subdirectory(llama.cpp). However, GGML_WEBGPU_JSPI defaults to ON, and JSPI browser support remains limited as of January 2026: Chrome/Edge require experimental flags, Firefox 139 has experimental support, and Safari/WebKit have not shipped JSPI support. This creates a significant compatibility gap for Safari users. Consider defaulting to OFF for broader compatibility, or explicitly document JSPI browser requirements and the experimental nature of this feature.
🤖 Prompt for AI Agents
In `@CMakeLists.txt` around lines 4 - 6, The GGML_WEBGPU_JSPI option currently
defaults to ON which breaks compatibility for many browsers; change the CMake
option declaration for GGML_WEBGPU_JSPI to default to OFF (i.e., set its default
value to OFF) and update the surrounding CMake comment or project documentation
to explicitly state JSPI is experimental and list the browser requirements/flags
(e.g., Chrome/Edge experimental flags, Firefox 139 experimental support,
Safari/WebKit not supported) so consumers know how to enable it intentionally.
There was a problem hiding this comment.
it's a good point, but there are still some unresolved issues with using ASYNCIFY. We could try to handle it, but it also seems like WebKit is planning on supporting JSPI at some point (WebKit/standards-positions#422)
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Fetch the WebKit standards position on JSPI to understand the status
gh api repos/WebKit/standards-positions/issues/422 --jq '{state: .state, title: .title, created: .created_at, updated: .updated_at, body: .body, comments: .comments}' 2>/dev/null || echo "Failed to fetch issue details"
# Get recent comments on the issue to understand current status
gh api repos/WebKit/standards-positions/issues/422/comments --jq '.[-3:] | .[] | {created: .created_at, author: .user.login, body: .body}' 2>/dev/null || echo "Failed to fetch comments"
Length of output: 4287
@reeselevine, that's a fair point! The tradeoff between ASYNCIFY issues and JSPI browser support is a reasonable engineering decision. I can see the WebKit standards position issue shows positive progress—kmiller68 from WebKit commented in September 2025 that "we've withdrawn our objections to this API," which is a good signal for future support.
Given that ASYNCIFY has unresolved issues and JSPI enables better functionality for the majority of browsers that do support it (Chrome, Edge, Firefox experimental), defaulting to ON makes sense for your use case.
One suggestion: consider adding a note in the README or build documentation about JSPI browser requirements so users on Safari/WebKit-based browsers understand why WebGPU might not work for them yet, and can disable the option if needed. Something like:
## WebGPU Backend
- Requires JSPI (JavaScript Promise Integration) support
- Supported: Chrome/Edge (with flags), Firefox 139+ (experimental)
- Not yet supported: Safari/WebKit (under consideration)
- To disable: `cmake -DGGML_WEBGPU_JSPI=OFF`But that's just a nice-to-have for user experience—the default is justified given the technical constraints.
🧠 Learnings used
Learnt from: reeselevine
Repo: ngxson/wllama PR: 198
File: src/workers-code/llama-cpp.js:269-279
Timestamp: 2025-12-29T03:44:02.649Z
Learning: In wllama's src/workers-code/llama-cpp.js, the isAsync parameter in callWrapper should only be true for functions included in JSPI_EXPORTS build configuration (wllama_start and wllama_action). Other functions like wllama_exit, wllama_debug, and wllama_malloc should use isAsync=false even though the callWrapper returns an async function, because they don't perform async operations at the C/WASM level.
|
Ok, I guess if we want to support Safari, we will need to build separate wasm blobs without JSPI/WebGPU for contexts that don't support it. Unfortunately, it looks like the wasm built with JSPI isn't compatible with Safari, even if the WebGPU backend is not used. I can look into supporting ASYNCIFY a bit and if I get it working compare the performance. Potentially there's a path where only ASYNCIFY is used until JSPI is more widely supported. Or wllama will have to ship more wasm binaries, ones with JSPI and ones without. |
|
Yeah I think we can have the third build named Anyways, I'm still not sure why CI doesn't run automatically on this PR 🤔 I just need the |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/wllama.ts`:
- Around line 601-621: The code can set this.nbThreads to 0 when
navigator.hardwareConcurrency is 1; change the hwConcurrency and nbThreads
calculation to enforce a minimum of 1: compute hwConcurrency = Math.max(1,
Math.floor((navigator.hardwareConcurrency || 1) / 2)) and then set
this.nbThreads = Math.max(1, config.n_threads ?? hwConcurrency); keep the
existing check that sets this.useMultiThread = true only when this.nbThreads > 1
and retain the warning branch when it falls back to single-thread; update
references to this.nbThreads (used later as n_threads) accordingly.
- Around line 134-143: getPerfContext() currently returns the raw
GlueMsgPerfContextRes which includes the internal _name: "pctx_res" field and
leaks it through the public PerfContextData API; modify getPerfContext() to
transform/map the GlueMsgPerfContextRes into a PerfContextData object by copying
only the public fields (success, t_start_ms, t_load_ms, t_p_eval_ms, t_eval_ms,
n_p_eval, n_eval, n_reused) and omitting _name before returning so callers
receive the clean PerfContextData shape.
🧹 Nitpick comments (1)
src/wllama.ts (1)
587-599: Consider adding try-catch around WebGPU adapter request.While
requestAdapter()typically returnsnullon failure, wrapping it in try-catch would guard against edge cases where the call throws (browser bugs, permission errors, etc.).Suggested defensive approach
if (this.config.preferWebGPU) { if (navigator.gpu) { - if(await navigator.gpu.requestAdapter()) { - this.useWebGPU = true; - } else { - this.logger().warn('WebGPU backend requested but no adapter found, falling back to CPU'); + try { + if (await navigator.gpu.requestAdapter()) { + this.useWebGPU = true; + } else { + this.logger().warn('WebGPU backend requested but no adapter found, falling back to CPU'); + } + } catch (e) { + this.logger().warn('WebGPU adapter request failed, falling back to CPU', e); } } else { this.logger().warn( 'WebGPU backend requested but WebGPU is not available, falling back to CPU' ); } }
| if (await isSupportMultiThread()) { | ||
| if (this.pathConfig['multi-thread/wllama.wasm']) { | ||
| const hwConcurrency = Math.floor((navigator.hardwareConcurrency || 1) / 2); | ||
| this.nbThreads = config.n_threads ?? hwConcurrency; | ||
| if (this.nbThreads > 1) { | ||
| this.useMultiThread = true; | ||
| } else { | ||
| this.logger().warn( | ||
| 'Falling back single-thread due to n_threads configuration or limited hardware concurrency' | ||
| ); | ||
| } | ||
| } else { | ||
| this.logger().warn( | ||
| 'Missing paths to "multi-thread/wllama.wasm", falling back to single-thread' | ||
| ); | ||
| } | ||
| } else { | ||
| this.logger().warn( | ||
| 'Multi-threads are not supported in this environment, falling back to single-thread' | ||
| ); | ||
| } |
There was a problem hiding this comment.
Edge case: nbThreads may become 0 when hardware concurrency is limited.
When navigator.hardwareConcurrency is 1, hwConcurrency becomes Math.floor(1/2) = 0. If config.n_threads is not specified, nbThreads will be set to 0, which could cause unexpected behavior when passed to the worker (line 671: n_threads: this.nbThreads).
Proposed fix: ensure minimum of 1 thread
- const hwConcurrency = Math.floor((navigator.hardwareConcurrency || 1) / 2);
- this.nbThreads = config.n_threads ?? hwConcurrency;
+ const hwConcurrency = Math.max(1, Math.floor((navigator.hardwareConcurrency || 1) / 2));
+ this.nbThreads = config.n_threads ?? hwConcurrency;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (await isSupportMultiThread()) { | |
| if (this.pathConfig['multi-thread/wllama.wasm']) { | |
| const hwConcurrency = Math.floor((navigator.hardwareConcurrency || 1) / 2); | |
| this.nbThreads = config.n_threads ?? hwConcurrency; | |
| if (this.nbThreads > 1) { | |
| this.useMultiThread = true; | |
| } else { | |
| this.logger().warn( | |
| 'Falling back single-thread due to n_threads configuration or limited hardware concurrency' | |
| ); | |
| } | |
| } else { | |
| this.logger().warn( | |
| 'Missing paths to "multi-thread/wllama.wasm", falling back to single-thread' | |
| ); | |
| } | |
| } else { | |
| this.logger().warn( | |
| 'Multi-threads are not supported in this environment, falling back to single-thread' | |
| ); | |
| } | |
| if (await isSupportMultiThread()) { | |
| if (this.pathConfig['multi-thread/wllama.wasm']) { | |
| const hwConcurrency = Math.max(1, Math.floor((navigator.hardwareConcurrency || 1) / 2)); | |
| this.nbThreads = config.n_threads ?? hwConcurrency; | |
| if (this.nbThreads > 1) { | |
| this.useMultiThread = true; | |
| } else { | |
| this.logger().warn( | |
| 'Falling back single-thread due to n_threads configuration or limited hardware concurrency' | |
| ); | |
| } | |
| } else { | |
| this.logger().warn( | |
| 'Missing paths to "multi-thread/wllama.wasm", falling back to single-thread' | |
| ); | |
| } | |
| } else { | |
| this.logger().warn( | |
| 'Multi-threads are not supported in this environment, falling back to single-thread' | |
| ); | |
| } |
🤖 Prompt for AI Agents
In `@src/wllama.ts` around lines 601 - 621, The code can set this.nbThreads to 0
when navigator.hardwareConcurrency is 1; change the hwConcurrency and nbThreads
calculation to enforce a minimum of 1: compute hwConcurrency = Math.max(1,
Math.floor((navigator.hardwareConcurrency || 1) / 2)) and then set
this.nbThreads = Math.max(1, config.n_threads ?? hwConcurrency); keep the
existing check that sets this.useMultiThread = true only when this.nbThreads > 1
and retain the warning branch when it falls back to single-thread; update
references to this.nbThreads (used later as n_threads) accordingly.
|
@coderabbitai pause |
✅ Actions performedReviews paused. |
|
Not sure how to open a PR targeting this branch in this repository, but I've added ASYNCIFY support and some discussion points here: reeselevine#1 |
|
@ngxson I know you're probably busy but would be interested in getting some of this integrated soon. |
|
@reeselevine sorry for the delay, I'm going back to wllama this week. For now, |
See #198 for original discussion
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.