OfflineLLM
OfflineLLM v1.0.0 — Initial Release
A fully offline, private AI chat app for Android. All LLM inference runs entirely on-device via llama.cpp. No internet permissions. No cloud. No tracking. - data never leaves the users device.
Features:
- On-device inference with optimized ARM NEON/SVE/i8mm native libraries
- Streaming token-by-token response display
- Import any GGUF model at runtime via file picker
- Multiple conversations with auto-titling and rename
- Chat search and individual message deletion
- Theme selector (System/Light/Dark/AMOLED Black)
- Accent colour picker with 9 colour options
- Configurable system prompts (General, Coder, Creative Writer, Tutor, Custom)
- Temperature, max tokens, and context size controls
- Optional thinking tag stripping for reasoning models
- Encrypted settings via Jetpack Security
- Optional biometric lock
- Chat export/import as JSON
- Built-in help guide for downloading models from HuggingFace
- Zero network permissions — verified in manifest
Recommended models:
- Gemma 3 270M (Q4_K_M) — Fast, works on 4GB RAM devices - included in this APK by default.
- Qwen3.5 0.8B (Q4_K_M) — Good balance for 4-6GB RAM
- Gemma 3 1B (Q4_K_M) — Recommended for 6-8GB RAM
- Qwen3.5 4B (Q4_K_M) — Best quality for 8GB+ RAM
Install: Enable Unknown Sources, then install the APK via file manager or adb install.
<3 JEGLY