Kotlin-LlamaCpp

Implementing GGUF Local Inference into Android Devices with EASE

Native AI inference for Android devices

Run GGUF models directly on your Android device with optimized performance and zero cloud dependency!

This is an Android binding for llama.cpp written in Kotlin, designed specifically for native Android applications. Built to leverage modern hardware capabilities, this library brings efficient large language model inference to mobile devices. The project is inspired by cui-llama.rn and llama.cpp: Inference of LLaMA and multimodal models in pure C/C++, specifically tailored for native Android development in Kotlin.

Changelog

v0.4.0 (Latest)

Modernized Core: Native codebase synchronized with the latest llama.cpp upstream (via cui-llama.rn).
Multimodal Support: Full support for vision/multimodal models (e.g., LLaVA) using mmproj files.
Improved File Handling: Migrated to a robust File Descriptor (FD) passing mechanism, bypassing Android's scoped storage restrictions and /proc/self/fd limitations.
Architecture Support: Expanded support beyond ARM-only; optimized for 64-bit platforms (arm64-v8a and x86_64).
Real-time Streaming: Enhanced JNI streaming logic with robust UTF-8 buffering to prevent crashes during token generation.
UI State Feedback: Improved LlamaHelper to provide immediate feedback during long-running image analysis/projection phases.

Why On-Device AI?

Modern Android devices (Snapdragon, MediaTek, Exynos, Tensor) now possess the power to run sophisticated AI models locally. Kotlin-LlamaCpp enables:

True On-Device AI: Run large language models entirely on your phone or tablet—no internet required, complete privacy.
Hardware-Accelerated Inference: Automatic detection and utilization of CPU features (i8mm, dotprod) for hardware-accelerated inference on ARM and x86.
Mobile-First Design: Context management and batch interruption designed for the constraints of mobile processors.
Multimodal Capabilities: Analyze images locally using multimodal projectors (mmproj).

Features

Multi-Architecture Support: Built for arm64-v8a and x86_64 with automatic CPU feature detection.
Multimodal Inference: Support for LLaVA and other vision models with per-prompt image injection.
Efficient Mobile Inference: Context Shift support from kobold.cpp enables longer conversations without memory overflow.
Kotlin-First Design: Helper class to handle initialization and context management seamlessly.
Seamless Android Integration: Uses ContentResolver and File Descriptors to work naturally with Android 11+ scoped storage.

Demo App

You can find a complete, ready-to-build demo application in the /app directory. The demo showcases model loading, multimodal image selection, handling inference in a ViewModel, and displaying generated text in a Jetpack Compose UI.

Installation

Add the following to your project's build.gradle:

dependencies {
    implementation 'io.github.ljcamargo:llamacpp-kotlin:0.4.0'
}

Usage

Basic Text Completion

// Initialize and load
llamaHelper.load(
    path = modelUriString, // e.g. from a file picker
    contextLength = 2048,
) {
    // Model loaded!
}

// Generate
llamaHelper.predict("Why is the sky blue?")

Multimodal (Image Processing)

To use multimodal features, you need a base GGUF model and its corresponding mmproj file.

// Load with multimodal projector
llamaHelper.load(
    path = baseModelUri,
    contextLength = 4096,
    mmprojPath = mmprojUri // Pass the projector file here
) {
    // Multimodal context ready!
}

// Generate with an image
llamaHelper.predict(
    prompt = "Describe this image in detail:",
    imagePath = selectedImageUri // Pass the image URI here
)

Native Code Maintenance

The native C++ core is synchronized with the latest llama.cpp developments. For instructions on how to update or rebuild the native components, see the llamaCpp README.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Acknowledgments

This project builds upon the work of:

llama.cpp by Georgi Gerganov
cui-llama.rn by Vali-98
llama.rn

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
.idea		.idea
app		app
gradle		gradle
llamaCpp		llamaCpp
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kotlin-LlamaCpp

Implementing GGUF Local Inference into Android Devices with EASE

Changelog

v0.4.0 (Latest)

Why On-Device AI?

Features

Demo App

Installation

Usage

Basic Text Completion

Multimodal (Image Processing)

Native Code Maintenance

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 1

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Kotlin-LlamaCpp

Implementing GGUF Local Inference into Android Devices with EASE

Changelog

v0.4.0 (Latest)

Why On-Device AI?

Features

Demo App

Installation

Usage

Basic Text Completion

Multimodal (Image Processing)

Native Code Maintenance

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 1

Languages

Packages