Skip to content

OfflineLLM v2.0.0

Choose a tag to compare

@jegly jegly released this 04 Apr 07:41
· 29 commits to main since this release
84309ed

OfflineLLM v2.0.0

A fully offline, private AI chat app for Android. All inference runs on-device via llama.cpp. Zero network permissions.

What's New in v2.0.0

  • Advanced Sampling Parameters — Full control over Temperature, Top-P, Top-K, Min-P, and Repeat Penalty with slider UI and plain-English explanations
  • Context Size Slider — Adjustable from 512 to 16384 tokens
  • Text-to-Speech — Read AI responses aloud (speaker icon on assistant messages)
  • Chat Search — Search messages within conversations
  • Delete Individual Messages — Long-press any message to delete
  • Auto-Title Conversations — Chat titles set automatically from your first message
  • Theme Selector — System Default / Light / Dark / AMOLED Black
  • Accent Colour Picker — 9 colour options
  • Thinking Tag Stripping — Hides blocks from reasoning models
  • Empty Response Fix — No more blank message bubbles
  • Help Screen — Built-in guide for downloading models from HuggingFace
  • About Screen — Version info, license, links

Downloads

  • OfflineLLM-v2.0.0-release.apk — Install directly on any Android 14+ device
  • gemma-3-270m-it-Q4_K_M.gguf — Bundled model, fast on 4GB RAM devices (~300MB)

Install

  1. Download the APK and (optionally) a model file
  2. Enable "Install unknown apps" in Android settings
  3. Install the APK, complete onboarding
  4. Import the GGUF model from Settings → Import GGUF Model

Recommended Models

Model Size Best For
Gemma 3 270M Q4_K_M ~300 MB 4GB RAM, fast responses
Qwen3.5 0.8B Q4_K_M ~530 MB 4-6GB RAM, good balance
Gemma 3 1B Q4_K_M ~750 MB 6-8GB RAM
Qwen3.5 4B Q4_K_M ~2.5 GB 8GB+ RAM, best quality