Skip to content

Conversation

@vandana-rajan
Copy link

Adds a new WASM example for running quantized Embedding Gemma 300M models in the browser with WebAssembly.

Available Models:

  1. Q8_0 (approx 340MB) and
  2. Q4_0 (approx 297MB)

Both models are from Unsloth AI. Further, output from these models are post-processed by 2 dense layers and normalized. These are provided by Google

Demo interface follows Bert wasm example.

Changes

  1. New example in candle-wasm-examples/quant-embed-gemma/
  2. Modifications in candle-transformers/src/models/quantized_gemma3.rs to accommodate embedding gemma

Usage

wasm-pack build --target web --release
python3 ./serve.py --port 8000
# Opens http://localhost:8000

Note - Adding changes on top of Dr. JesseGlass's modifications to quantized_gemma3.rs

@vandana-rajan
Copy link
Author

Hello @DrJesseGlass

I have added this new wasm example on top of your changes that is yet to be merged. Once your PR is merged, I think I can just rebase on it. Meanwhile, if you could kindly review this PR, it would be great.

Thanks,
Vandana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants