Skip to content

Releases: jegly/OfflineLLM

OfflineLLM-v3.0.0

08 Apr 12:03
f42923a

Choose a tag to compare

OfflineLLM now supports Gemma4 models in GGUF format.
Minor bug patches and optimizations applied.

OfflineLLM v2.0.0

04 Apr 07:41
84309ed

Choose a tag to compare

OfflineLLM v2.0.0

A fully offline, private AI chat app for Android. All inference runs on-device via llama.cpp. Zero network permissions.

What's New in v2.0.0

  • Advanced Sampling Parameters — Full control over Temperature, Top-P, Top-K, Min-P, and Repeat Penalty with slider UI and plain-English explanations
  • Context Size Slider — Adjustable from 512 to 16384 tokens
  • Text-to-Speech — Read AI responses aloud (speaker icon on assistant messages)
  • Chat Search — Search messages within conversations
  • Delete Individual Messages — Long-press any message to delete
  • Auto-Title Conversations — Chat titles set automatically from your first message
  • Theme Selector — System Default / Light / Dark / AMOLED Black
  • Accent Colour Picker — 9 colour options
  • Thinking Tag Stripping — Hides blocks from reasoning models
  • Empty Response Fix — No more blank message bubbles
  • Help Screen — Built-in guide for downloading models from HuggingFace
  • About Screen — Version info, license, links

Downloads

  • OfflineLLM-v2.0.0-release.apk — Install directly on any Android 14+ device
  • gemma-3-270m-it-Q4_K_M.gguf — Bundled model, fast on 4GB RAM devices (~300MB)

Install

  1. Download the APK and (optionally) a model file
  2. Enable "Install unknown apps" in Android settings
  3. Install the APK, complete onboarding
  4. Import the GGUF model from Settings → Import GGUF Model

Recommended Models

Model Size Best For
Gemma 3 270M Q4_K_M ~300 MB 4GB RAM, fast responses
Qwen3.5 0.8B Q4_K_M ~530 MB 4-6GB RAM, good balance
Gemma 3 1B Q4_K_M ~750 MB 6-8GB RAM
Qwen3.5 4B Q4_K_M ~2.5 GB 8GB+ RAM, best quality

OfflineLLM

04 Apr 03:49
2e2fa87

Choose a tag to compare

OfflineLLM v1.0.0 — Initial Release

A fully offline, private AI chat app for Android. All LLM inference runs entirely on-device via llama.cpp. No internet permissions. No cloud. No tracking. - data never leaves the users device.

Features:

  • On-device inference with optimized ARM NEON/SVE/i8mm native libraries
  • Streaming token-by-token response display
  • Import any GGUF model at runtime via file picker
  • Multiple conversations with auto-titling and rename
  • Chat search and individual message deletion
  • Theme selector (System/Light/Dark/AMOLED Black)
  • Accent colour picker with 9 colour options
  • Configurable system prompts (General, Coder, Creative Writer, Tutor, Custom)
  • Temperature, max tokens, and context size controls
  • Optional thinking tag stripping for reasoning models
  • Encrypted settings via Jetpack Security
  • Optional biometric lock
  • Chat export/import as JSON
  • Built-in help guide for downloading models from HuggingFace
  • Zero network permissions — verified in manifest

Recommended models:

  • Gemma 3 270M (Q4_K_M) — Fast, works on 4GB RAM devices - included in this APK by default.
  • Qwen3.5 0.8B (Q4_K_M) — Good balance for 4-6GB RAM
  • Gemma 3 1B (Q4_K_M) — Recommended for 6-8GB RAM
  • Qwen3.5 4B (Q4_K_M) — Best quality for 8GB+ RAM

Install: Enable Unknown Sources, then install the APK via file manager or adb install.

<3 JEGLY