Releases · jegly/OfflineLLM

08 Apr 12:03

jegly

3.0.0

f42923a

OfflineLLM-v3.0.0 Latest

Latest

OfflineLLM now supports Gemma4 models in GGUF format.
Minor bug patches and optimizations applied.

Assets 3

04 Apr 07:41

jegly

2.0.0

84309ed

OfflineLLM v2.0.0

A fully offline, private AI chat app for Android. All inference runs on-device via llama.cpp. Zero network permissions.

What's New in v2.0.0

Advanced Sampling Parameters — Full control over Temperature, Top-P, Top-K, Min-P, and Repeat Penalty with slider UI and plain-English explanations
Context Size Slider — Adjustable from 512 to 16384 tokens
Text-to-Speech — Read AI responses aloud (speaker icon on assistant messages)
Chat Search — Search messages within conversations
Delete Individual Messages — Long-press any message to delete
Auto-Title Conversations — Chat titles set automatically from your first message
Theme Selector — System Default / Light / Dark / AMOLED Black
Accent Colour Picker — 9 colour options
Thinking Tag Stripping — Hides blocks from reasoning models
Empty Response Fix — No more blank message bubbles
Help Screen — Built-in guide for downloading models from HuggingFace
About Screen — Version info, license, links

Downloads

OfflineLLM-v2.0.0-release.apk — Install directly on any Android 14+ device
gemma-3-270m-it-Q4_K_M.gguf — Bundled model, fast on 4GB RAM devices (~300MB)

Install

Download the APK and (optionally) a model file
Enable "Install unknown apps" in Android settings
Install the APK, complete onboarding
Import the GGUF model from Settings → Import GGUF Model

Recommended Models

Model	Size	Best For
Gemma 3 270M Q4_K_M	~300 MB	4GB RAM, fast responses
Qwen3.5 0.8B Q4_K_M	~530 MB	4-6GB RAM, good balance
Gemma 3 1B Q4_K_M	~750 MB	6-8GB RAM
Qwen3.5 4B Q4_K_M	~2.5 GB	8GB+ RAM, best quality

Assets 3

04 Apr 03:49

jegly

1.0.0

2e2fa87

OfflineLLM

OfflineLLM v1.0.0 — Initial Release

A fully offline, private AI chat app for Android. All LLM inference runs entirely on-device via llama.cpp. No internet permissions. No cloud. No tracking. - data never leaves the users device.

Features:

On-device inference with optimized ARM NEON/SVE/i8mm native libraries
Streaming token-by-token response display
Import any GGUF model at runtime via file picker
Multiple conversations with auto-titling and rename
Chat search and individual message deletion
Theme selector (System/Light/Dark/AMOLED Black)
Accent colour picker with 9 colour options
Configurable system prompts (General, Coder, Creative Writer, Tutor, Custom)
Temperature, max tokens, and context size controls
Optional thinking tag stripping for reasoning models
Encrypted settings via Jetpack Security
Optional biometric lock
Chat export/import as JSON
Built-in help guide for downloading models from HuggingFace
Zero network permissions — verified in manifest

Recommended models:

Gemma 3 270M (Q4_K_M) — Fast, works on 4GB RAM devices - included in this APK by default.
Qwen3.5 0.8B (Q4_K_M) — Good balance for 4-6GB RAM
Gemma 3 1B (Q4_K_M) — Recommended for 6-8GB RAM
Qwen3.5 4B (Q4_K_M) — Best quality for 8GB+ RAM

Install: Enable Unknown Sources, then install the APK via file manager or adb install.

<3 JEGLY

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

OfflineLLM v2.0.0

What's New in v2.0.0

Downloads

Install

Recommended Models

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: jegly/OfflineLLM

OfflineLLM-v3.0.0

Uh oh!

OfflineLLM v2.0.0

OfflineLLM v2.0.0

What's New in v2.0.0

Downloads

Install

Recommended Models

Uh oh!

OfflineLLM

Uh oh!