Releases: jegly/OfflineLLM
Releases · jegly/OfflineLLM
OfflineLLM-v3.0.0
OfflineLLM v2.0.0
OfflineLLM v2.0.0
A fully offline, private AI chat app for Android. All inference runs on-device via llama.cpp. Zero network permissions.
What's New in v2.0.0
- Advanced Sampling Parameters — Full control over Temperature, Top-P, Top-K, Min-P, and Repeat Penalty with slider UI and plain-English explanations
- Context Size Slider — Adjustable from 512 to 16384 tokens
- Text-to-Speech — Read AI responses aloud (speaker icon on assistant messages)
- Chat Search — Search messages within conversations
- Delete Individual Messages — Long-press any message to delete
- Auto-Title Conversations — Chat titles set automatically from your first message
- Theme Selector — System Default / Light / Dark / AMOLED Black
- Accent Colour Picker — 9 colour options
- Thinking Tag Stripping — Hides blocks from reasoning models
- Empty Response Fix — No more blank message bubbles
- Help Screen — Built-in guide for downloading models from HuggingFace
- About Screen — Version info, license, links
Downloads
- OfflineLLM-v2.0.0-release.apk — Install directly on any Android 14+ device
- gemma-3-270m-it-Q4_K_M.gguf — Bundled model, fast on 4GB RAM devices (~300MB)
Install
- Download the APK and (optionally) a model file
- Enable "Install unknown apps" in Android settings
- Install the APK, complete onboarding
- Import the GGUF model from Settings → Import GGUF Model
Recommended Models
| Model | Size | Best For |
|---|---|---|
| Gemma 3 270M Q4_K_M | ~300 MB | 4GB RAM, fast responses |
| Qwen3.5 0.8B Q4_K_M | ~530 MB | 4-6GB RAM, good balance |
| Gemma 3 1B Q4_K_M | ~750 MB | 6-8GB RAM |
| Qwen3.5 4B Q4_K_M | ~2.5 GB | 8GB+ RAM, best quality |
OfflineLLM
OfflineLLM v1.0.0 — Initial Release
A fully offline, private AI chat app for Android. All LLM inference runs entirely on-device via llama.cpp. No internet permissions. No cloud. No tracking. - data never leaves the users device.
Features:
- On-device inference with optimized ARM NEON/SVE/i8mm native libraries
- Streaming token-by-token response display
- Import any GGUF model at runtime via file picker
- Multiple conversations with auto-titling and rename
- Chat search and individual message deletion
- Theme selector (System/Light/Dark/AMOLED Black)
- Accent colour picker with 9 colour options
- Configurable system prompts (General, Coder, Creative Writer, Tutor, Custom)
- Temperature, max tokens, and context size controls
- Optional thinking tag stripping for reasoning models
- Encrypted settings via Jetpack Security
- Optional biometric lock
- Chat export/import as JSON
- Built-in help guide for downloading models from HuggingFace
- Zero network permissions — verified in manifest
Recommended models:
- Gemma 3 270M (Q4_K_M) — Fast, works on 4GB RAM devices - included in this APK by default.
- Qwen3.5 0.8B (Q4_K_M) — Good balance for 4-6GB RAM
- Gemma 3 1B (Q4_K_M) — Recommended for 6-8GB RAM
- Qwen3.5 4B (Q4_K_M) — Best quality for 8GB+ RAM
Install: Enable Unknown Sources, then install the APK via file manager or adb install.
<3 JEGLY