Skip to content

OfflineLLM

Choose a tag to compare

@jegly jegly released this 04 Apr 03:49
· 40 commits to main since this release
2e2fa87

OfflineLLM v1.0.0 — Initial Release

A fully offline, private AI chat app for Android. All LLM inference runs entirely on-device via llama.cpp. No internet permissions. No cloud. No tracking. - data never leaves the users device.

Features:

  • On-device inference with optimized ARM NEON/SVE/i8mm native libraries
  • Streaming token-by-token response display
  • Import any GGUF model at runtime via file picker
  • Multiple conversations with auto-titling and rename
  • Chat search and individual message deletion
  • Theme selector (System/Light/Dark/AMOLED Black)
  • Accent colour picker with 9 colour options
  • Configurable system prompts (General, Coder, Creative Writer, Tutor, Custom)
  • Temperature, max tokens, and context size controls
  • Optional thinking tag stripping for reasoning models
  • Encrypted settings via Jetpack Security
  • Optional biometric lock
  • Chat export/import as JSON
  • Built-in help guide for downloading models from HuggingFace
  • Zero network permissions — verified in manifest

Recommended models:

  • Gemma 3 270M (Q4_K_M) — Fast, works on 4GB RAM devices - included in this APK by default.
  • Qwen3.5 0.8B (Q4_K_M) — Good balance for 4-6GB RAM
  • Gemma 3 1B (Q4_K_M) — Recommended for 6-8GB RAM
  • Qwen3.5 4B (Q4_K_M) — Best quality for 8GB+ RAM

Install: Enable Unknown Sources, then install the APK via file manager or adb install.

<3 JEGLY