OfflineLLM v2.0.0
OfflineLLM v2.0.0
A fully offline, private AI chat app for Android. All inference runs on-device via llama.cpp. Zero network permissions.
What's New in v2.0.0
- Advanced Sampling Parameters — Full control over Temperature, Top-P, Top-K, Min-P, and Repeat Penalty with slider UI and plain-English explanations
- Context Size Slider — Adjustable from 512 to 16384 tokens
- Text-to-Speech — Read AI responses aloud (speaker icon on assistant messages)
- Chat Search — Search messages within conversations
- Delete Individual Messages — Long-press any message to delete
- Auto-Title Conversations — Chat titles set automatically from your first message
- Theme Selector — System Default / Light / Dark / AMOLED Black
- Accent Colour Picker — 9 colour options
- Thinking Tag Stripping — Hides blocks from reasoning models
- Empty Response Fix — No more blank message bubbles
- Help Screen — Built-in guide for downloading models from HuggingFace
- About Screen — Version info, license, links
Downloads
- OfflineLLM-v2.0.0-release.apk — Install directly on any Android 14+ device
- gemma-3-270m-it-Q4_K_M.gguf — Bundled model, fast on 4GB RAM devices (~300MB)
Install
- Download the APK and (optionally) a model file
- Enable "Install unknown apps" in Android settings
- Install the APK, complete onboarding
- Import the GGUF model from Settings → Import GGUF Model
Recommended Models
| Model | Size | Best For |
|---|---|---|
| Gemma 3 270M Q4_K_M | ~300 MB | 4GB RAM, fast responses |
| Qwen3.5 0.8B Q4_K_M | ~530 MB | 4-6GB RAM, good balance |
| Gemma 3 1B Q4_K_M | ~750 MB | 6-8GB RAM |
| Qwen3.5 4B Q4_K_M | ~2.5 GB | 8GB+ RAM, best quality |