LingoLite is a lightweight, mobile-optimized neural machine translation (NMT) framework designed for efficient multilingual translation on resource-constrained devices. Built with PyTorch, it features a modern transformer architecture with state-of-the-art optimizations for mobile deployment.
LingoLite is ready for community experimentation but remains non-production.
NO CHECKPOINTS: ship your own tokenizer and model artifactsPIPELINE IN FLUX: training loop validated only on tiny synthetic dataBRING DATA: repository does not include real datasetsAPI NEEDS ARTIFACTS: server fails closed unless checkpoints/tokenizers are mountedCOMMUNITY DRIVEN: success depends on contributors sharing improvementsRESEARCH FOCUS: refer todocs/reports/PRODUCTION_READINESS.mdfor detailed limitations
See docs/reports/OPEN_SOURCE_READINESS_REPORT.md for the latest open-source verification summary.
- Features
- Recent Updates
- Architecture
- Installation
- Quick Start
- REST API Server
- Docker Deployment
- Getting Started
- Usage Examples
- Model Quantization
- ONNX Export for Mobile Deployment
- Model Evaluation
- Training
- Testing
- Model Configuration
- Generation Parameters
- Security
- Performance
- Project Structure
- Contributing
- Citation
- License
- Acknowledgments
- Documentation
- Support
-
Mobile-Optimized Architecture: Designed specifically for efficient inference on mobile devices
- Grouped Query Attention (GQA) reduces memory footprint by 4-8x
- Rotary Position Embeddings (RoPE) eliminates learned position parameters
- SwiGLU Feed-Forward Networks for efficient computation
- Weight tying between encoder/decoder embeddings
-
Multilingual Translation: Supports 6 languages out of the box
- English (en), Spanish (es), French (fr), German (de), Italian (it), Danish (da)
- Easy to extend to additional languages
-
Advanced Generation Methods:
- Greedy decoding for fastest inference
- Beam search for higher quality translations
- KV caching for efficient autoregressive generation
- Temperature-based sampling for diverse outputs
-
Development Infrastructure:
- FastAPI REST API server with async support (requires trained model)
- Docker and Docker Compose deployment configurations
- Comprehensive input validation and error handling
- Security-hardened file operations
- Professional logging infrastructure
- Automated test suite with pytest (unit tests only, no integration tests)
- Model quantization (INT8) utilities and ONNX export scripts
- BLEU evaluation scripts (untested on real data)
-
Flexible Model Sizes:
- Tiny: ~7M parameters (~30MB FP32, ~7.5MB INT8)
- Small: ~60M parameters (~240MB FP32, ~60MB INT8)
- Medium: ~140M parameters (~560MB FP32, ~140MB INT8)
October 26, 2025 - Production readiness fixes:
- ✅ Fixed Training Pipeline: Resolved OneCycleLR crash; training loop now respects max_steps
- ✅ Proper Training Entry Point: Command-line interface with validation and error handling
- ✅ Fixed Dependencies: Added missing numpy to requirements.txt
- ✅ Automated Testing: Converted manual tests to pytest with proper assertions
- ✅ Fail-Closed Deployment: API server now requires trained model and tokenizer to start
- ✅ Honest Documentation: Added PRODUCTION_READINESS.md with accurate assessment
⚠️ Status Disclaimer: Clear warning that project is not production-ready
Previous Updates (framework components):
- ✅ REST API Server: FastAPI-based HTTP endpoints (requires trained model)
- ✅ Docker Support: Containerization configurations
- ✅ Model Quantization: Utility scripts for INT8 quantization
- ✅ ONNX Export: Mobile deployment export scripts
- ✅ BLEU Evaluation: Translation quality assessment scripts
- ✅ Danish Language Support: 6 language support (en, es, fr, de, it, da)
See PRODUCTION_READINESS.md for current status.
LingoLite uses a Transformer encoder-decoder architecture with modern optimizations:
┌─────────────────────────────────────────────┐
│ Source Text (e.g., English) │
└──────────────────┬──────────────────────────┘
│
┌─────────▼──────────┐
│ TranslationTokenizer│
│ (SentencePiece) │
└─────────┬──────────┘
│ Token IDs
┌─────────▼──────────┐
│ Token Embeddings │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ Transformer │
│ Encoder │
│ (Bidirectional) │
│ • RoPE Position │
│ • GQA Attention │
│ • SwiGLU FFN │
└─────────┬──────────┘
│ Context
│
┌─────────▼──────────┐
│ Transformer │
│ Decoder │
│ (Causal) │
│ • Self-Attention │
│ • Cross-Attention │
│ • SwiGLU FFN │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ Output Projection │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ TranslationTokenizer│
│ (Decode) │
└─────────┬──────────┘
│
┌──────────────────▼──────────────────────────┐
│ Target Text (e.g., Spanish) │
└─────────────────────────────────────────────┘
- RMSNorm: Efficient normalization layer (lighter than LayerNorm)
- Rotary Position Embeddings (RoPE): Relative position encoding without learned parameters
- Grouped Query Attention (GQA): Reduces KV cache size while maintaining quality
- SwiGLU: Gated Linear Unit with Swish activation for efficient feed-forward networks
- Python 3.8 or higher
- PyTorch 2.0 or higher
- 4GB+ RAM (for tiny model), 16GB+ recommended (for larger models)
Install in editable mode so local changes are picked up automatically:
# Minimal runtime (core + REST API)
pip install -e .[api]
# Full developer setup (tests, linting, REST API)
pip install -e .[api,dev]Key dependencies (see pyproject.toml for details):
torch>=2.0.0– Deep learning frameworksentencepiece>=0.1.99– Tokenizationsacrebleu>=2.3.1– Translation evaluationtqdm>=4.65.0– Progress bars
python scripts/install.pyThis will verify that all required files are present and properly structured.
from lingolite.translation_tokenizer import TranslationTokenizer
# Prepare training data file paths (parallel corpora recommended)
corpus_files = [
"data/corpus_en.txt",
"data/corpus_es.txt",
"data/corpus_fr.txt",
"data/corpus_de.txt",
"data/corpus_it.txt",
"data/corpus_da.txt",
]
# Train tokenizer and save artifacts
tokenizer = TranslationTokenizer(vocab_size=24000)
tokenizer.train(corpus_files)
tokenizer.save("tokenizer_model")from lingolite.mobile_translation_model import create_model
# Create a tiny model for exploratory work
model = create_model(vocab_size=24000, model_size="tiny")
params = model.count_parameters()
print(f"Model has {params['total']:,} trainable parameters")import torch
# Prepare input
text = "Hello, world!"
input_ids = tokenizer.encode(
text,
src_lang="en",
tgt_lang="es",
add_special_tokens=True,
)
input_tensor = torch.tensor([input_ids])
# Generate translation (greedy)
output_ids = model.generate(
src_input_ids=input_tensor,
max_length=128,
sos_token_id=tokenizer.sos_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode output
translation = tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)
print(f"Translation: {translation}")# Generate with beam search (slower but higher quality)
output_ids = model.generate_beam(
src_input_ids=input_tensor,
max_length=128,
num_beams=4,
length_penalty=1.0,
sos_token_id=tokenizer.sos_token_id,
eos_token_id=tokenizer.eos_token_id
)
translation = tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)
print(f"Beam search translation: {translation}")LingoLite includes a production-ready FastAPI server for serving translations via HTTP endpoints.
pip install -e .[api] # install server dependencies
export LINGOLITE_USE_STUB_TOKENIZER=1 # optional: use stub tokenizer (no artifacts)
export LINGOLITE_ALLOW_RANDOM_MODEL=1 # optional: create random tiny model
export LINGOLITE_MODEL_SIZE=small # optional: choose tiny/small/medium/large
export LINGOLITE_DEVICE=auto # optional: auto|cpu|cuda
export LINGOLITE_ALLOWED_ORIGINS=http://localhost,http://127.0.0.1
lingolite-apiWindows PowerShell:
pip install -e .[api]
$env:LINGOLITE_USE_STUB_TOKENIZER = "1"
$env:LINGOLITE_ALLOW_RANDOM_MODEL = "1"
$env:LINGOLITE_MODEL_SIZE = "small"
$env:LINGOLITE_DEVICE = "auto"
$env:LINGOLITE_ALLOWED_ORIGINS = "http://localhost,http://127.0.0.1"
lingolite-apiLINGOLITE_MODEL_SIZE, LINGOLITE_DEVICE, and LINGOLITE_ALLOWED_ORIGINS are applied on startup so you can pin the preset, choose CPU/GPU, and lock CORS domains without modifying the server code.
Health Check
curl http://localhost:8000/healthTranslate Text
curl -X POST http://localhost:8000/translate \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, world!",
"src_lang": "en",
"tgt_lang": "es",
"max_length": 128,
"method": "beam",
"num_beams": 4
}'Interactive API documentation is available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Tip: Set
LINGOLITE_ECHO_MODE=1to echo inputs without running the model (useful for smoke tests).
LingoLite supports containerized deployment with Docker and Docker Compose.
# Build the Docker image
docker build -t lingolite:latest .
# Run the container
docker run -p 8000:8000 lingolite:latest# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose downThe Docker setup includes:
- Multi-stage build for optimized image size
- Health checks and automatic restarts
- Volume mounts for model persistence
- Configurable resource limits
- Production-ready security settings
See DEPLOYMENT_GUIDE.md for detailed deployment instructions.
See scripts/examples.py for comprehensive examples including:
- Tokenizer Training - Train a multilingual SentencePiece tokenizer
- Model Creation - Create models of different sizes
- Basic Inference - Simple translation with greedy decoding
- Advanced Generation - Beam search and temperature sampling
- Model Quantization - Reduce model size with INT8 quantization
- Complete Workflow - End-to-end training and inference pipeline
Run examples:
python scripts/examples.pyLingoLite includes comprehensive quantization utilities to reduce model size and improve inference speed.
from lingolite.quantization_utils import quantize_model_dynamic
# Quantize model to INT8
quantized_model = quantize_model_dynamic(
model,
dtype=torch.qint8,
output_path="model_quantized.pt"
)
# Model size reduced by ~75% (FP32 → INT8)
print(f"Size reduction: {model.num_parameters() * 4 / (1024**2):.1f}MB → "
f"{model.num_parameters() / (1024**2):.1f}MB")from lingolite.quantization_utils import quantize_model_static
# Prepare calibration dataset
calibration_data = [...] # Your representative samples
# Static quantization for maximum efficiency
quantized_model = quantize_model_static(
model,
calibration_data,
output_path="model_static_quantized.pt"
)from lingolite.quantization_utils import prepare_qat_model, convert_qat_model
# Prepare model for QAT
qat_model = prepare_qat_model(model)
# Train with quantization simulation
trainer.train(qat_model)
# Convert to quantized model
quantized_model = convert_qat_model(qat_model)Quantization features:
- Dynamic Quantization: Fast post-training quantization
- Static Quantization: Calibration-based for optimal accuracy
- Quantization-Aware Training: Train with quantization in the loop
- Compression Analysis: Detailed size and performance metrics
Export models to ONNX format for deployment on mobile devices (TensorFlow Lite, CoreML, etc.).
from export_onnx import export_to_onnx
# Export encoder and decoder separately for mobile optimization
export_to_onnx(
model,
encoder_path="encoder.onnx",
decoder_path="decoder.onnx",
vocab_size=24000,
max_seq_length=128
)python scripts/export_onnx.py \
--model-path translation_model.pt \
--tokenizer-path tokenizer_model \
--output-dir ./onnx_models \
--max-seq-length 128 \
--opset-version 14import onnxruntime as ort
# Load and verify ONNX model
session = ort.InferenceSession("encoder.onnx")
print(f"Inputs: {[i.name for i in session.get_inputs()]}")
print(f"Outputs: {[o.name for o in session.get_outputs()]}")ONNX export features:
- Separate encoder/decoder: Optimized for mobile architectures
- Dynamic shapes: Support variable sequence lengths
- Quantization-ready: Export quantized models
- Validation: Automatic output verification
- Mobile-optimized: TensorFlow Lite and CoreML compatible
Evaluate translation quality using industry-standard BLEU scores.
from pathlib import Path
from evaluate_model import evaluate_model
from evaluate_bleu import compute_bleu
# Evaluate a trained checkpoint against a dataset of source/target pairs
results = evaluate_model(
model_path=Path("checkpoints/model.pt"),
tokenizer_path=Path("tokenizer"),
source_file=Path("data/test.src"),
target_file=Path("data/test.tgt"),
)
print(f"BLEU Score: {results['bleu']:.2f}")
print(f"chrF Score: {results['chrf']:.2f}")python scripts/evaluate_model.py \
--model checkpoints/model.pt \
--tokenizer tokenizer \
--source data/test.src \
--target data/test.tgt \
--output reports/eval.jsonThe evaluation suite provides:
- BLEU scores: Standard MT quality metric (sacrebleu)
- Per-language pair analysis: Individual scores for each translation direction
- Inference speed: Tokens per second, latency analysis
- Memory profiling: Peak memory usage during inference
- Error analysis: Common failure patterns and edge cases
See COMMUNITY_DEPLOYMENT_REVIEW.md for the latest community deployment checklist and verification notes.
Use high-quality, balanced corpora for each supported language. Public datasets that work well for compact translation models include:
- Europarl v10 – Parliamentary proceedings with consistent domain coverage across many European languages.
- Tatoeba Challenge – Sentence-aligned community translations that provide colloquial phrasing and short-form utterances.
- OPUS OpenSubtitles – Informal movie and TV dialog suitable for conversational styles (ensure proper cleaning).
- Global Voices – News articles translated by native speakers; useful for narrative and journalistic tone.
- CCMatrix – Large-scale web-mined parallel corpus that is helpful for pretraining before domain-specific fine-tuning.
- JW300 – Religious text translations that can improve coverage for low-resource language pairs when filtered appropriately.
Combine multiple corpora to diversify styles and reduce domain bias. When expanding to new languages, prefer resources that include explicit language codes or metadata for clean filtering.
from lingolite.training import TranslationDataset
from torch.utils.data import DataLoader
# Your parallel corpus
data = [
{"src": "Hello", "tgt": "Hola", "src_lang": "en", "tgt_lang": "es"},
{"src": "Goodbye", "tgt": "Adiós", "src_lang": "en", "tgt_lang": "es"},
# ... more examples
]
# Create dataset
dataset = TranslationDataset(
data=data,
tokenizer=tokenizer,
max_length=128
)
# Create dataloader
dataloader = DataLoader(
dataset,
batch_size=32,
shuffle=True,
collate_fn=lambda batch: collate_fn(batch, tokenizer.pad_token_id)
)from lingolite.training import TranslationTrainer
# Initialize trainer
trainer = TranslationTrainer(
model=model,
train_dataloader=dataloader,
learning_rate=1e-4,
num_epochs=10,
device="cuda" if torch.cuda.is_available() else "cpu"
)
# Train
trainer.train()
# Save model
torch.save(model.state_dict(), "translation_model.pt")- Preprocess & Normalize – Lowercase consistently, normalize punctuation with Moses scripts, remove duplicates, and filter out noisy or misaligned sentence pairs.
- Split Strategically – Build stratified train/validation/test splits for each language pair to monitor overfitting and domain drift. Ensure held-out sets cover varied sequence lengths and styles.
- Tokenizer Iteration – Train the tokenizer on the full multilingual mix, inspect
coverage statistics, and retrain with adjusted
character_coverageif rare glyphs are dropped. - Curriculum Training – Start with the highest-resource pairs (e.g., en↔es, en↔fr) for stable convergence, then gradually interleave medium- and low-resource pairs using temperature-based sampling to avoid forgetting.
- Regular Evaluation – Track BLEU/chrF scores per language pair with SacreBLEU. Complement metrics with human review of edge cases (idioms, named entities).
- Fine-Tune & Distill – After base training, fine-tune on target-domain data (e.g., customer support) and optionally distill from a larger teacher model to maintain quality under mobile constraints.
- Quantization-Aware Training – Enable INT8-aware fine-tuning before deployment to minimize accuracy loss when compressing the model.
- Mixed Precision Training: Automatic with
torch.cuda.amp(GPU only) - Gradient Accumulation: For effective larger batch sizes
- Learning Rate Scheduling: OneCycleLR for optimal convergence
- Progress Tracking: Real-time loss and metrics with tqdm
- Checkpointing: Save model at regular intervals
Run the automated test suite:
# Run targeted tests (recommended)
pytest -v tests
# Skip slow markers if desired
pytest -v tests -m "not slow"
# With coverage reporting
pytest -v tests --cov=lingoliteTest Coverage:
- ✅ Input validation for all parameters
- ✅ Tensor dimension checking
- ✅ Token ID range validation
- ✅ KV cache functionality
- ✅ Beam search generation
- ✅ Helper functions (format_size, format_time, device selection)
- ✅ Model generation methods
- ❌ Training pipeline (not tested)
- ❌ API endpoints (not tested)
- ❌ Integration tests (not implemented)
Validate code structure:
python scripts/validate_improvements.pymodel = MobileTranslationModel(
vocab_size=24000,
d_model=256,
num_encoder_layers=4,
num_decoder_layers=4,
num_heads=4,
d_ff=1024,
dropout=0.1
)
# ~7M parameters, ~30MB FP32, ~7.5MB INT8model = MobileTranslationModel(
vocab_size=24000,
d_model=512,
num_encoder_layers=6,
num_decoder_layers=6,
num_heads=8,
d_ff=2048,
dropout=0.1
)
# ~60M parameters, ~240MB FP32, ~60MB INT8model = MobileTranslationModel(
vocab_size=24000,
d_model=768,
num_encoder_layers=8,
num_decoder_layers=8,
num_heads=12,
d_ff=3072,
dropout=0.1
)
# ~140M parameters, ~560MB FP32, ~140MB INT8output = model.generate(
src_input_ids=input_ids,
max_length=128,
temperature=1.0, # Lower = more deterministic
sos_token_id=1,
eos_token_id=2
)output = model.generate_beam(
src_input_ids=input_ids,
max_length=128,
num_beams=4, # More beams = better quality but slower
length_penalty=1.0, # >1.0 favors longer, <1.0 favors shorter
early_stopping=True, # Stop when all beams finish
sos_token_id=1,
eos_token_id=2
)LingoLite implements comprehensive security measures:
- Input Validation: All inputs validated for type, shape, and range
- Path Validation: File operations protected against directory traversal
- Resource Limits: Max length constraints prevent memory exhaustion
- Token ID Validation: Prevents out-of-bounds access
- No Code Execution: Pure data processing, no eval() or exec()
See SECURITY.md for detailed security policies and audit guidance.
| Model | FP32 | INT8 | KV Cache (128 tokens) |
|---|---|---|---|
| Tiny | 30MB | 7.5MB | ~2MB |
| Small | 240MB | 60MB | ~8MB |
| Medium | 560MB | 140MB | ~18MB |
- Greedy: 5-10 tokens/second (tiny), 1-3 tokens/second (medium)
- Beam Search (4 beams): 2-5 tokens/second (tiny), 0.5-1 tokens/second (medium)
- With KV Cache: 2-3x speedup for greedy decoding
Note: Actual speed depends on hardware, sequence length, and batch size
LingoLite/
|-- docs/
| |-- guides/
| | `-- DEPLOYMENT_GUIDE.md
| |-- policies/
| | |-- CODE_OF_CONDUCT.md
| | |-- CONTRIBUTING.md
| | `-- SECURITY.md
| |-- reference/
| | |-- CHANGELOG.md
| | |-- RELEASE_CHECKLIST.md
| | `-- RELEASE_NOTES_v0.1.0.md
| `-- reports/
| |-- IMPROVEMENTS.md
| |-- OPEN_SOURCE_READINESS_REPORT.md
| `-- PRODUCTION_READINESS.md
|-- examples/
| `-- data/
| `-- tiny_dataset.json
|-- lingolite/
| |-- __init__.py
| |-- encoder_decoder.py
| |-- generation_utils.py
| |-- mobile_translation_model.py
| |-- model_components.py
| |-- quantization_utils.py
| |-- tokenizer_stub.py
| |-- translation_tokenizer.py
| `-- training.py
|-- scripts/
| |-- api_server.py
| |-- install.py
| |-- make_tiny_dataset.py
| `-- validate_improvements.py
|-- tests/
| |-- test_api_bypass_startup.py
| `-- ... (beam search, cache, and generation tests)
|-- pyproject.toml
|-- requirements.txt
|-- Dockerfile
`-- README.md
Contributions are welcome! Areas for improvement:
- Complete KV Cache Integration: Full integration with decoder layers for maximum speedup
- Additional Languages: Extend tokenizer for more language pairs (currently supports 6)
- Mobile Framework Integration: Convert ONNX models to TensorFlow Lite/CoreML
- Model Distillation: Implement knowledge distillation from larger teacher models
- More Tests: Edge cases, stress tests, integration tests for new features
- Benchmarks: BLEU scores on standard datasets (WMT, OPUS, Flores)
- Monitoring: Add Prometheus metrics and Grafana dashboards
- Multi-GPU Training: Distributed training support for large-scale datasets
# Install development dependencies
pip install pytest black flake8
# Run tests
pytest -v
# Format code
black *.py
# Lint code
flake8 *.py --max-line-length=100If you use LingoLite in your research or project, please cite:
@software{lingolite2025,
title = {LingoLite: Mobile-Optimized Neural Machine Translation},
author = {LingoLite Contributors},
year = {2025},
url = {https://github.com/TSOR666/LingoLite}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Transformer Architecture: Vaswani et al., "Attention is All You Need" (2017)
- Rotary Position Embeddings: Su et al., "RoFormer" (2021)
- Grouped Query Attention: Ainslie et al., "GQA: Training Generalized Multi-Query Transformer Models" (2023)
- SwiGLU: Shazeer, "GLU Variants Improve Transformer" (2020)
- PyTorch: Paszke et al., "PyTorch: An Imperative Style, High-Performance Deep Learning Library" (2019)
- SentencePiece: Kudo & Richardson, "SentencePiece: A simple and language independent approach to subword tokenization" (2018)
Comprehensive documentation is available:
- README.md - Quick start guide and API reference (this file)
- PRODUCTION_READINESS.md - START HERE: Honest assessment of current state
- OPEN_SOURCE_READINESS_REPORT.md - Open source release checklist and legal verification
- COMMUNITY_DEPLOYMENT_REVIEW.md - Deployment & training readiness review for contributors
- SECURITY.md - Security policy and vulnerability reporting
- CHANGELOG.md - Version history and release notes
- CONTRIBUTING.md - Contribution guidelines and development setup
- CODE_OF_CONDUCT.md - Community guidelines
- DEPLOYMENT_GUIDE.md - Deployment instructions (requires trained model)
- IMPROVEMENTS.md - Recent improvements and changes
- scripts/examples.py - Code examples and usage patterns
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check the comprehensive documentation listed above
- Review API documentation at
/docswhen running the API server - Review
scripts/examples.pyfor usage patterns
Built with modern ML best practices for efficient mobile translation
The fastest way to explore LingoLite locally without training artifacts:
-
Install (API extras)
pip install -e .[api]
-
Run the API with a stub tokenizer and a random tiny model (dev mode):
- Linux/macOS:
export LINGOLITE_USE_STUB_TOKENIZER=1export LINGOLITE_ALLOW_RANDOM_MODEL=1lingolite-api
- Windows PowerShell:
$env:LINGOLITE_USE_STUB_TOKENIZER=1$env:LINGOLITE_ALLOW_RANDOM_MODEL=1lingolite-api
- Linux/macOS:
-
Optional echo mode (bypass model execution):
- Set
LINGOLITE_ECHO_MODE=1to return the input text directly.
- Set
-
Quick tiny dataset for experimentation:
python scripts/make_tiny_dataset.py- Writes
examples/data/tiny_dataset.json
To train for real usage, follow the tokenizer training steps and the training CLI described above; then place artifacts under ./tokenizer/ and a model checkpoint under ./models/translation_model.pt.