You can easily switch between different vision models by editing the .env file:
# Edit .env file
MODEL_NAME=<model-name>Then restart the service:
# Kill existing service
lsof -ti:8000 | xargs kill -9
# Restart
make runMODEL_NAME=qwen/Qwen2-VL-2B-Instruct- Size: ~2-4 GB
- Speed: Very fast (2-4s)
- Accuracy: Good for simple receipts
- RAM: ~4-6 GB
- Best for: Quick testing, simple receipts
MODEL_NAME=qwen/Qwen2-VL-7B-Instruct- Size: ~8 GB
- Speed: Moderate (4-8s)
- Accuracy: Better OCR, less hallucination
- RAM: ~8-12 GB
- Best for: Production, complex invoices
MODEL_NAME=llava-hf/llava-1.5-7b-hf- Size: ~7 GB
- Speed: Moderate (5-10s)
- Accuracy: Good general vision understanding
- RAM: ~8-10 GB
- Best for: Alternative if Qwen doesn't work well
| Model | Size | Speed | Accuracy | Hallucination | RAM |
|---|---|---|---|---|---|
| Qwen2-VL-2B | 2-4GB | ⚡⚡⚡ | ⭐⭐⭐ | High | 4-6GB |
| Qwen2-VL-7B | 8GB | ⚡⚡ | ⭐⭐⭐⭐ | Low | 8-12GB |
| LLaVA-1.5-7B | 7GB | ⚡⚡ | ⭐⭐⭐ | Medium | 8-10GB |
Controls randomness in output:
TEMPERATURE=0.1 # More deterministic (recommended for OCR)
TEMPERATURE=0.7 # More creative (not recommended)Maximum length of response:
MAX_TOKENS=2000 # Standard (recommended)
MAX_TOKENS=4000 # For very detailed invoicesSolutions:
- ✅ Use larger model (7B instead of 2B)
- ✅ Lower temperature to 0.1
- ✅ Improve prompt (already done in config.py)
- ✅ Ensure image quality is good
Solutions:
- Use smaller model (2B instead of 7B)
- Reduce MAX_TOKENS
- Close other applications
Solutions:
- Use smaller model (2B)
- Close other applications
- Restart your Mac
Solutions:
- Use 7B model (better OCR)
- Ensure image is clear and high resolution
- Try preprocessing image (increase contrast)
Quick test script:
# Test current model
curl -X POST http://localhost:8000/extract \
-F "image=@testdata/sample_receipt.png" | jq
# Check which model is loaded
curl http://localhost:8000/health | jqMODEL_NAME=qwen/Qwen2-VL-2B-Instruct
MAX_TOKENS=2000
TEMPERATURE=0.1MODEL_NAME=qwen/Qwen2-VL-7B-Instruct
MAX_TOKENS=2000
TEMPERATURE=0.1
PRELOAD_MODEL=true # Faster first requestMODEL_NAME=qwen/Qwen2-VL-2B-Instruct
MAX_TOKENS=1500
TEMPERATURE=0.1You can use any MLX-compatible vision model from Hugging Face:
- Find model on https://huggingface.co/models
- Check if it's MLX-compatible
- Update
.env:MODEL_NAME=username/model-name
- Restart service
Note: First run will download the model (~5-15 minutes depending on size).
The 2B model tends to generate fake data instead of reading the actual receipt. This is because:
- Small models have limited OCR capabilities
- They try to "complete" the task even without reading properly
Solution: Use the 7B model which has better vision understanding and OCR capabilities.