A robust CLI tool for translating large text documents using local LLM models through Ollama. Optimized for translategemma models, it employs a smart chunking strategy to handle documents of any length while preserving context and structure.
- Divide and Conquer Strategy: Automatically splits large documents into manageable chunks using intelligent separators (paragraphs, sentences) to maintain context.
- Resumable & Safe:
- Partial Saving: If you interrupt the process (Ctrl+C), it automatically saves the chunks translated so far.
- Robust Error Handling: Graceful error messages for connection issues or empty files.
- Flexible Configuration: Configure via:
- CLI Arguments (Highest priority)
- YAML Config File
- Environment Variables
- Smart Defaults
- Rich User Experience: Visual progress bars, spinners, and colored output using
rich.
-
Ollama: Ensure Ollama is installed and running.
ollama serve
-
Model: Pull the default translation model (or your preferred translategemma model size):
ollama pull translategemma:12b
Install using pipx to keep dependencies isolated:
pipx install git+https://github.com/arrase/gemma-translator.gitClone the repository and install in editable mode:
git clone https://github.com/arrase/gemma-translator.git
cd gemma-translator
pip install -e .Translate a file using default settings (English -> Spanish):
gemma-translator input.txtOutput will be saved to input_es.txt by default.
# Specify output file
gemma-translator input.txt -o my_translation.txt
# Specify source and target languages
gemma-translator input.txt --source-lang English --target-lang French --target-code frgemma-translator input.txt --model translategemma:4bThe application looks for a configuration file at ~/.gemma-translator.yaml.
Create ~/.gemma-translator.yaml to set your persistent preferences:
model_name: "translategemma:12b"
api_base: "http://localhost:11434"
source_lang: "English"
source_code: "en"
target_lang: "Spanish"
target_code: "es"
chunk_size: 1000
chunk_overlap: 0You can also use environment variables (prefixed with GEMMA_). These are useful for containerized environments or temporary overrides.
| Variable | Config Option |
|---|---|
GEMMA_MODEL_NAME |
model_name |
GEMMA_API_BASE |
api_base |
GEMMA_SOURCE_LANG |
source_lang |
GEMMA_TARGET_LANG |
target_lang |
GEMMA_CHUNK_SIZE |
chunk_size |
| Option | Short | Description | Default |
|---|---|---|---|
--output |
-o |
Output file path | {stem}_{lang_code}{suffix} |
--config |
-c |
Config file path | ~/.gemma-translator.yaml |
--model |
-m |
Ollama model name | translategemma:12b |
--api-base |
Ollama API URL | http://localhost:11434 |
|
--source-lang |
-s |
Source language name | English |
--source-code |
Source ISO code | en |
|
--target-lang |
-t |
Target language name | Spanish |
--target-code |
Target ISO code | es |
|
--chunk-size |
Characters per chunk | 1000 |
|
--chunk-overlap |
Overlap chars | 0 |
- Ingestion: Reads the input text file.
- Chunking: Splits the text into chunks defined by
chunk_size(default 1000 chars), respecting natural boundaries like newlines and sentences to avoid breaking context. - Translation: Sends each chunk to the local Ollama LLM with a system prompt optimized for translation.
- Assembly: Combines translated chunks and saves the final result.