Skip to content

Releases: ayinedjimi/KVortex

KVortex v1.0 - Production Release

16 Feb 17:48

Choose a tag to compare

🚀 KVortex v1.0 - Production Release

VRAM to RAM Offloader for AI and vLLM

What's New

First production release of KVortex, a high-performance C++23 KV cache engine optimized for vLLM 0.15.

🎯 Key Features

  • Multi-stream GPU transfers (3+ CUDA streams, 20+ GB/s bandwidth)
  • NUMA-aware memory management (pinned + async allocation)
  • SHA256 content-addressable caching (thread-safe)
  • LRU eviction policy (O(1) operations)
  • CPU backend (pinned memory, 16-128GB)
  • Modern C++23 (std::expected, std::format)
  • 100% test coverage (10/10 tests passing)
  • Production-ready (0 memory leaks)

📊 Performance

Metric Result
TTFT Improvement (Cache Hit) 6x faster
GPU→CPU Bandwidth 20+ GB/s
Cache Miss Overhead <5%
Memory Leaks 0 bytes

📦 Assets

  • kvortex-v1.0-linux-x86_64-cuda13.1.tar.gz - Compiled static library (1.3MB)
  • kvortex-v1.0-headers.tar.gz - C++ headers for integration

🔧 Requirements

  • NVIDIA RTX 3090+ (Compute Capability 8.6+)
  • CUDA 13.1+
  • GCC 13.3+ with C++23 support
  • Ubuntu 24.04+ recommended

📚 Documentation

Full documentation available in the repository:

👨‍💻 Author

Ayi NEDJIMI

📄 License

Apache 2.0 (based on LMCache)