Skip to content

Conversation

@DrJesseGlass
Copy link
Contributor

@DrJesseGlass DrJesseGlass commented Jan 23, 2026

Adds support for Google's TranslateGemma translation models (55 languages) and consolidates the Gemma model family into a unified module structure.

Reorganization

gemma.rs → gemma/gemma1.rs
Consolidated gemma2.rs, gemma3.rs, quantized_gemma3.rs under gemma/
Added gemma/mod.rs with re-exports for backward compatibility

TranslateGemma Support

Added gemma/translate_gemma.rs with prompt formatting utilities and ISO 639-1 language codes
Added examples/translate-gemma.rs supporting both full precision and quantized inference

Bug Fixes

gemma3.rs: Make KV tensors contiguous before cache append (fixes "slice-set only supports contiguous tensors" error with certain GQA ratios)
quantized_gemma3.rs: Added clear_kv_cache() method for multi-turn inference

Usage Notes

Full precision models auto-download from HuggingFace
Quantized inference requires a local GGUF file via --model-path (no official Google GGUF conversions; community conversions available on HuggingFace)

Known Issue

Investigation shows gemma3.rs uses GELU while quantized_gemma3.rs uses SiLU. This is a gemma3.rs issue, not specific to TranslateGemma.

Key and value states become non-contiguous after transpose but
KvCache::append() requires contiguous tensors for slice_set.
This worked for some model dimensions but failed for others
(e.g., TranslateGemma 4B with different GQA ratios).
…ls; however this is because quantized_gemma3 and gemma3 have different activation functions
@DrJesseGlass DrJesseGlass marked this pull request as ready for review January 23, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant