Skip to content

Conversation

@nico-martin
Copy link
Collaborator

This adds caching of the wasm Binary.
I also added an env.cacheKey so developers can modify the cacheKey. By default it will be transformers-cache but they are free to set something related to their app.

Note: this will only cache the wasm file. So there will still be a request to https://cdn.jsdelivr.net/ for the mjs file. In my opinion, this should be cached via the service-worker if the applications requires full offline support.

@nico-martin nico-martin requested a review from xenova December 1, 2025 14:30
@xenova
Copy link
Collaborator

xenova commented Dec 1, 2025

So there will still be a request to https://cdn.jsdelivr.net/ for the mjs file. In my opinion, this should be cached via the service-worker if the applications requires full offline support.

Indeed, caching the mjs file would be necessary to ensure full offline support, and imo is essential before adding a caching feature like this PR proposes. Any ideas for how we could do this? Perhaps bundling ort-wasm-simd-threaded.jsep.mjs with transformers.js file?

@nico-martin nico-martin changed the title added wasm cache [v4] added wasm cache Dec 2, 2025
Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice + clean PR! Now we just need to figure out how to completely remove the ort-wasm-simd-threaded.jsep.mjs dependency.

@nico-martin
Copy link
Collaborator Author

So after another deep-dive into onnxruntime I figured out that it's actually no problem at all to load the wasm factory (.mjs) as a blob, which allows us to load it from cache.
The only problem is that inside the factory there are URLs that try to resolve relative with "import.meta.url". If we just replace this with the actual baseURL ist works just fine.
https://github.com/huggingface/transformers.js/blob/v4-cache-wasm-file/src/backends/utils/cacheWasm.js#L76

On to of that I also did some refactoring of the hub.js. My goal is to keep large files that only have a handfull of exported methods as clean as possible by extracting some heloper functions and constants into their separate files.

I also wanted to improve the caching (which is now used not only in the hub.js but also in the backends/onnx.js) so I created a helper function also also an interface "CacheInterface" that any given cache has to implement.

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very worthwhile refactor -- thanks!


I wonder if you think the following feature could be useful: design some form of CacheRegistry class which a user could import from the library... like

import { cache } from '@huggingface/transformers';

/// check if model is cached
// cache.match('org/model') or something -- API should be well-designed, returning a list/map of files that are cached for this model maybe?
// cache.delete('org/model') -- remove all files cached for this model

I think we can draw inspiration from hf cache cli tool

hf cache --help
Usage: hf cache [OPTIONS] COMMAND [ARGS]...

  Manage local cache directory.

Options:
  --help  Show this message and exit.

Commands:
  ls      List cached repositories or revisions.
  prune   Remove detached revisions from the cache.
  rm      Remove cached repositories or revisions.
  verify  Verify checksums for a single repo revision from cache or a...

may not need to be added in this PR, but maybe something to discuss here.

@nico-martin
Copy link
Collaborator Author

I like the CacheRegistry! The only problem is that we normally don't know upfront all the files a model will load/expect (altough I think it would be great to add that as well).
I think that opens up a completely new topic. I will keep this on my radar but not implement it here in the PR.

@nico-martin nico-martin requested a review from xenova December 15, 2025 09:13
Comment on lines 14 to 18
if (cache) {
try {
return await cache.match(url);
} catch (e) {
console.warn(`Error reading ${fileName} from cache:`, e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if await cache.match(url) returns undefined (i.e., it is not in cache), then we return undefined from this function... and the fetch below is never called (meaning, it is never cached).

Comment on lines 14 to 18
if (cache) {
try {
return await cache.match(url);
} catch (e) {
console.warn(`Error reading ${fileName} from cache:`, e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (cache) {
try {
return await cache.match(url);
} catch (e) {
console.warn(`Error reading ${fileName} from cache:`, e);
if (cache) {
try {
const result = await cache.match(url);
if (result) {
return result;
}
} catch (e) {
console.warn(`Error reading ${fileName} from cache:`, e);

seems to fix it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this change.

xenova and others added 3 commits December 16, 2025 00:21
Don't throw error if we can't open cache or load file from cache, but we are able to make the request.
Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🚀 Had time today to finish the review, and it works well! I had to make one small adjustment (only return when await cache.match(url) matches... not when undefined), and it's good to go :)

I also like the refactor out of the monolithic hub.js file.

I tested it on the recent chatterbox webgpu demo, and it now runs fully offline thanks to this PR! 🔥

@xenova xenova merged commit 0082c20 into v4 Dec 16, 2025
1 check failed
@xenova xenova deleted the v4-cache-wasm-file branch December 16, 2025 05:38
@xmcp
Copy link

xmcp commented Dec 17, 2025

I encountered the below exception after this PR. It was fine at the previous commit (aab2326).

image

The inner (caught) exception is:

TypeError: Invalid base URL
    at blob:https://localhost:5173/49689b73-721b-4544-9d07-d0402577925e:100:240
    at ortWasmThreaded (blob:https://localhost:5173/49689b73-721b-4544-9d07-d0402577925e:100:478)
    at https://localhost:5173/@fs/.../transformers.js/dist/transformers.web.js:2628:28592
    at new Promise (<anonymous>)
    at Xt (https://localhost:5173/@fs/.../transformers.js/dist/transformers.web.js:2628:28380)
    at async Ns (https://localhost:5173/@fs/.../transformers.js/dist/transformers.web.js:2630:28936)
    at async yr.init (https://localhost:5173/@fs/.../transformers.js/dist/transformers.web.js:2630:33169)
    at async ec (https://localhost:5173/@fs/.../transformers.js/dist/transformers.web.js:2624:1454)
    at async La (https://localhost:5173/@fs/.../transformers.js/dist/transformers.web.js:2624:1733)
    at async a.create (https://localhost:5173/@fs/.../transformers.js/dist/transformers.web.js:2624:19895)

The blob:...:100:240 (the first line of the stack trace) points to this statement:

Va ??= g.locateFile ? g.locateFile ? g.locateFile("ort-wasm-simd-threaded.asyncify.wasm", ja) : ja + "ort-wasm-simd-threaded.asyncify.wasm" : (new URL("/@fs/.../transformers.js/dist/ort-wasm-simd-threaded.asyncify.wasm",import.meta.url)).href;

Seems that we still have remaining import.meta.urls in the imported mjs file.

@xmcp
Copy link

xmcp commented Dec 17, 2025

Btw I managed to implement mjs&wasm cache in this way, which works fine with previous versions (both aab2326 and v3 from npm). I'm not sure why it does not encounter the above exception.

import {pipeline, env} from "@huggingface/transformers";

import onnx_wasm from "../node_modules/@huggingface/transformers/dist/ort-wasm-simd-threaded.asyncify.wasm?url";
import onnx_mjs from "../node_modules/@huggingface/transformers/dist/ort-wasm-simd-threaded.asyncify.mjs?url";

async function init() {
    let cache = await caches.open('transformers-cache');

    let wasm_file = await cache.match(onnx_wasm);
    if(!wasm_file) {
        await cache.add(onnx_wasm);
        wasm_file = await cache.match(onnx_wasm);
    }

    let mjs_file = await cache.match(onnx_mjs);
    if(!mjs_file) {
        await cache.add(onnx_mjs);
        mjs_file = await cache.match(onnx_mjs);
    }

    env.localModelPath = '/models';
    env.allowRemoteModels = false;
    env.allowLocalModels = true;
    env.backends.onnx.wasm.wasmPaths = {
        wasm: URL.createObjectURL(await wasm_file.blob()),
        mjs: URL.createObjectURL(await mjs_file.blob()),
    };

    return await pipeline(
        ...
    );
}

@nico-martin
Copy link
Collaborator Author

Hi @xmcp
Thanks for reaching out!

Btw I managed to implement mjs&wasm cache in this way, which works fine with previous versions (both aab2326 and v3 from npm). I'm not sure why it does not encounter the above exception.

In your set up that makes total sense. The import statement makes sure the .wasm and the .mjs files are copied to your dist/ folder and onnx_wasm and onnx_mjs are then links to those files.
You then use the caches api to cache the files and pass them as an internal reference URL to env.backends.onnx.wasm.wasmPaths. So far so good.

Hardcoding the wasmPaths is ok, but there could be one issue. We only use the asyncify version for non-iOS devices because we experienced some issues in the past (maybe @xenova could elaborate on this).
https://github.com/huggingface/transformers.js/blob/v4/src/backends/onnx.js#L285-L293
In your example, all clients will us the asyncify version.

But back to your main issue: I think the problem is that now your code tries to cache the request, but then our code tries to cache it again. And now the caches API tries to cache a blob URL and I am not sure if that works :D.
Could you try to set env.useWasmCache = false right where you set all the other env properties?

@nico-martin
Copy link
Collaborator Author

I also created a little check for that case. @xmcp could you verify if that solves your problem?
https://github.com/huggingface/transformers.js/compare/v4...v4-cache-wasm-file-blob-fix?expand=1

@xmcp
Copy link

xmcp commented Dec 17, 2025

I can confirm that setting env.useWasmCache = false; solves the exception. The patch at v4-cache-wasm-file-blob-fix does not fix the exception if I pass normal URLs (not the blob one) to env.backends.onnx.wasm.wasmPaths. I will try to diagnose this.

@xmcp
Copy link

xmcp commented Dec 17, 2025

I think I know the problem. In my code snippet, onnx_wasm and onnx_mjs are relative URLs (/assets/.../xxx.mjs in prod or /@fs/.../xxx.mjs in dev). Therefore, the loadWasmFactory function replaced all import.meta.urls to relative path, which does not work for the second argument of a URL constructor.

Making the baseUrl absolute solves the problem. Here is a patch for that:

--- a/src/backends/utils/cacheWasm.js
+++ b/src/backends/utils/cacheWasm.js
@@ -74,7 +74,7 @@ export async function loadWasmFactory(libURL) {
     try {
         let code = await response.text();
         // Fix relative paths when loading factory from blob, overwrite import.meta.url with actual baseURL
-        const baseUrl = libURL.split('/').slice(0, -1).join('/');
+        const baseUrl = new URL(libURL.split('/').slice(0, -1).join('/'), location.href).href;
         code = code.replace(/import\.meta\.url/g, `"${baseUrl}"`);
         const blob = new Blob([code], { type: 'text/javascript' });
         return URL.createObjectURL(blob);

(As a bonus, it also urlencodes the baseUrl correctly, so it will no longer crash the program if libURL contains double quotes.)

@nico-martin
Copy link
Collaborator Author

Thanks for taking the time to investigate!
But I think that will break on the server (NodeJS, Bun, Deno). I'll dig a bit deeper and come back to you.

@nico-martin
Copy link
Collaborator Author

@xmcp
Copy link

xmcp commented Dec 18, 2025

Yes this version should work. Btw the isBlobURL(url) || isRemoteURL(url) check seems redundant because the URL class works fine for absolute URLs.

image

@nico-martin
Copy link
Collaborator Author

Good catch! I removed it.PR is open :)
#1489

@xenova
Copy link
Collaborator

xenova commented Dec 21, 2025

PR is merged ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants