16 Feb 17:48

ayinedjimi

KVortex v1.0 - Production Release Latest

Latest

🚀 KVortex v1.0 - Production Release

VRAM to RAM Offloader for AI and vLLM

What's New

First production release of KVortex, a high-performance C++23 KV cache engine optimized for vLLM 0.15.

🎯 Key Features

✅ Multi-stream GPU transfers (3+ CUDA streams, 20+ GB/s bandwidth)
✅ NUMA-aware memory management (pinned + async allocation)
✅ SHA256 content-addressable caching (thread-safe)
✅ LRU eviction policy (O(1) operations)
✅ CPU backend (pinned memory, 16-128GB)
✅ Modern C++23 (std::expected, std::format)
✅ 100% test coverage (10/10 tests passing)
✅ Production-ready (0 memory leaks)

📊 Performance

Metric	Result
TTFT Improvement (Cache Hit)	6x faster
GPU→CPU Bandwidth	20+ GB/s
Cache Miss Overhead	<5%
Memory Leaks	0 bytes

📦 Assets

kvortex-v1.0-linux-x86_64-cuda13.1.tar.gz - Compiled static library (1.3MB)
kvortex-v1.0-headers.tar.gz - C++ headers for integration

🔧 Requirements

NVIDIA RTX 3090+ (Compute Capability 8.6+)
CUDA 13.1+
GCC 13.3+ with C++23 support
Ubuntu 24.04+ recommended

📚 Documentation

Full documentation available in the repository:

👨‍💻 Author

Ayi NEDJIMI

Website: ayinedjimi-consultants.fr
Cybersecurity & AI Expert (20+ years)
OSCP Certified | RAG Systems Specialist

📄 License

Apache 2.0 (based on LMCache)

Assets 4