Skip to content

[Feat] [WIP] Ollama integration#119

Draft
ztang2370 wants to merge 8 commits intoovg-project:mainfrom
ztang2370:feature/ollama-integration
Draft

[Feat] [WIP] Ollama integration#119
ztang2370 wants to merge 8 commits intoovg-project:mainfrom
ztang2370:feature/ollama-integration

Conversation

@ztang2370
Copy link
Contributor

@ztang2370 ztang2370 commented Sep 14, 2025

Issue #81

  • Adds initial support for integrating Ollama with kvcached.
  • Verified workflow locally on a single CUDA GPU (RTX 3090).
  • Current implementation runs end-to-end but requires:
    • Additional testing (multi-GPU, different environments)
    • Review of integration approach (may not be best practice)
    • Potential optimizations for performance and maintainability

Marked as WIP. Feedback welcome on design and direction.

9.16 update:
https://docs.google.com/document/d/1mDTKBoCZslLcSu2OsgCNVzl-J6HeY-Vl7s19V938PHY/edit?tab=t.0

9.17 update:
Test branch:
https://github.com/ztang2370/kvcached/tree/ztang/test-ollama-integration
https://github.com/ztang2370/ollama/tree/my-v0.11.8

git clone git@github.com:ztang2370/kvcached.git
git switch ztang/test-ollama-integration
git submodule update --init
cd engine_integration/ollama-v0.11.8 && git switch my-v0.11.8
set up, build and run ollama

9.21 update:
webui: https://drive.google.com/file/d/1ZUGWDK3JleCciizZyTybe33inmGvAmVS/view?usp=sharing

TODO:

  • Complete bug-free end-to-end workflow
  • Running example
  • Benchmark performance

@ztang2370 ztang2370 changed the title [Feature] Ollama integration [Feat] [WIP] Ollama integration Sep 14, 2025
@jiarong0907
Copy link
Collaborator

@ztang2370 Thanks for the great work!

The direction this PR heading to looks goo to me. To show the benefits of kvcached, I think in the test, we need to run at least two models using ollama concurrently.

The README has some repeating words generated by AI. Also the setup script, please clean the symbols added by AI.

We also need a cool example to show off this. For example, in webui https://github.com/open-webui/open-webui, we can have two models running together in the model list. Just some quick thoughts---you could think about the most reasonable and easist way to show this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setup script has changed a lot for vllm and sglang. Maybe we can have separate script just for ollama.

Copy link
Collaborator

@ivanium ivanium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! In general I also love the direction of this PR. I think a key thing to add is a running example of co-running two models on the same GPU, and some performance numbers of their throughput, P99 TTFT, and P99 ITL.

I left some comments, but they are minor.

@ztang2370 ztang2370 force-pushed the feature/ollama-integration branch from 7d692f9 to f4031de Compare September 21, 2025 10:08
@ztang2370 ztang2370 force-pushed the feature/ollama-integration branch from 8a32284 to 90db4fe Compare September 21, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants