[Feat] [WIP] Ollama integration by ztang2370 · Pull Request #119 · ovg-project/kvcached

ztang2370 · 2025-09-14T12:01:19Z

Issue #81

Adds initial support for integrating Ollama with kvcached.
Verified workflow locally on a single CUDA GPU (RTX 3090).
Current implementation runs end-to-end but requires:
- Additional testing (multi-GPU, different environments)
- Review of integration approach (may not be best practice)
- Potential optimizations for performance and maintainability

Marked as WIP. Feedback welcome on design and direction.

9.16 update:
https://docs.google.com/document/d/1mDTKBoCZslLcSu2OsgCNVzl-J6HeY-Vl7s19V938PHY/edit?tab=t.0

9.17 update:
Test branch:
https://github.com/ztang2370/kvcached/tree/ztang/test-ollama-integration
https://github.com/ztang2370/ollama/tree/my-v0.11.8

git clone git@github.com:ztang2370/kvcached.git
git switch ztang/test-ollama-integration
git submodule update --init
cd engine_integration/ollama-v0.11.8 && git switch my-v0.11.8
set up, build and run ollama

9.21 update:
webui: https://drive.google.com/file/d/1ZUGWDK3JleCciizZyTybe33inmGvAmVS/view?usp=sharing

TODO:

Complete bug-free end-to-end workflow
Running example
Benchmark performance

jiarong0907 · 2025-09-14T16:02:50Z

@ztang2370 Thanks for the great work!

The direction this PR heading to looks goo to me. To show the benefits of kvcached, I think in the test, we need to run at least two models using ollama concurrently.

The README has some repeating words generated by AI. Also the setup script, please clean the symbols added by AI.

We also need a cool example to show off this. For example, in webui https://github.com/open-webui/open-webui, we can have two models running together in the model list. Just some quick thoughts---you could think about the most reasonable and easist way to show this.

kvcached/integration/ollama/interfaces.py

jiarong0907 · 2025-09-14T16:07:55Z

engine_integration/scripts/setup.sh

The setup script has changed a lot for vllm and sglang. Maybe we can have separate script just for ollama.

ivanium

Good job! In general I also love the direction of this PR. I think a key thing to add is a running example of co-running two models on the same GPU, and some performance numbers of their throughput, P99 TTFT, and P99 ITL.

I left some comments, but they are minor.

kvcached/integration/ollama/interfaces.py

engine_integration/scripts/kvcached-ollama-v0.11.8.patch

Signed-off-by: zt2370 <ztang2370@gmail.com>

ztang2370 changed the title ~~[Feature] Ollama integration~~ [Feat] [WIP] Ollama integration Sep 14, 2025

jiarong0907 requested review from ivanium and jiarong0907 September 14, 2025 15:57

jiarong0907 reviewed Sep 14, 2025

View reviewed changes

kvcached/integration/ollama/interfaces.py Outdated Show resolved Hide resolved

jiarong0907 reviewed Sep 14, 2025

View reviewed changes

kvcached/integration/ollama/interfaces.py Outdated Show resolved Hide resolved

jiarong0907 reviewed Sep 14, 2025

View reviewed changes

kvcached/integration/ollama/interfaces.py Outdated Show resolved Hide resolved

jiarong0907 reviewed Sep 14, 2025

View reviewed changes

ivanium reviewed Sep 14, 2025

View reviewed changes

ztang2370 force-pushed the feature/ollama-integration branch from 7d692f9 to f4031de Compare September 21, 2025 10:08

ztang2370 added 5 commits September 22, 2025 00:06

Ollama integration

6c608ac

Signed-off-by: zt2370 <ztang2370@gmail.com>

update interfaces

3a89558

update patch

efcae64

update patch: remove verbose logs

3ea6ceb

update setup script and integration readme

90db4fe

ztang2370 force-pushed the feature/ollama-integration branch from 8a32284 to 90db4fe Compare September 21, 2025 16:08

ztang2370 added 3 commits September 22, 2025 00:19

fix pre-commit error

c191625

fix

6cd391f

get ENABLE_KVCACHED from env var

c6f55dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] [WIP] Ollama integration#119

[Feat] [WIP] Ollama integration#119
ztang2370 wants to merge 8 commits intoovg-project:mainfrom
ztang2370:feature/ollama-integration

ztang2370 commented Sep 14, 2025 •

edited

Loading

Uh oh!

jiarong0907 commented Sep 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiarong0907 Sep 14, 2025

Uh oh!

ivanium left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ztang2370 commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiarong0907 commented Sep 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiarong0907 Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

ivanium left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ztang2370 commented Sep 14, 2025 •

edited

Loading