Skip to content

docs: Add comprehensive documentation and semantic router example#229

Open
yurekami wants to merge 1 commit intoovg-project:mainfrom
yurekami:feat/multiple-improvements-batch
Open

docs: Add comprehensive documentation and semantic router example#229
yurekami wants to merge 1 commit intoovg-project:mainfrom
yurekami:feat/multiple-improvements-batch

Conversation

@yurekami
Copy link

Summary

This PR addresses multiple documentation and feature requests to improve the kvcached user experience:

Documentation (docs/)

  • API.md - Comprehensive API reference including:

    • Environment variables documentation
    • CLI tools reference (kvctl commands)
    • Python API with code examples for vLLM and SGLang
    • Integration APIs and manual integration guide
    • Controller REST API endpoints
  • COMPATIBILITY.md - Version compatibility guide:

    • PyTorch version support matrix (2.4.x - 2.8.x)
    • vLLM version compatibility (0.8.4 - 0.11.x)
    • SGLang version support
    • Known issues (PyTorch 2.8.0 undefined symbol error)
    • GPU architecture support table
    • Container/Kubernetes compatibility notes
  • ARCHITECTURE.md - System architecture documentation:

    • System overview with ASCII diagrams
    • Engine decoupling explanation
    • IPC mechanism details
    • Memory lifecycle (allocation/deallocation flow)
    • Configuration guidance for multi-engine setups

Examples (examples/)

  • 09_semantic_router/ - Content-based request routing:
    • FastAPI-based semantic router
    • Sentence-transformers integration for query classification
    • Fallback keyword matching when embeddings unavailable
    • Statistics and monitoring endpoints

Bug Fixes

  • Improved error messages in ElasticBlockPool.get_new_blocks():
    • Shows available vs requested blocks
    • Displays current memory usage percentage
    • Provides actionable suggestions

Issues Addressed

Test Plan

  • Verify docs render correctly on GitHub
  • Run semantic router example with test models
  • Verify error message improvements in block allocation

🤖 Generated with Claude Code

This PR addresses multiple documentation and feature requests:

- Add docs/API.md with comprehensive API reference (ovg-project#48)
  - Environment variables documentation
  - CLI tools reference (kvctl)
  - Python API with code examples
  - Integration APIs for vLLM and SGLang
  - Controller REST API endpoints

- Add docs/COMPATIBILITY.md for version compatibility (ovg-project#222)
  - PyTorch version support (2.4.x - 2.8.x)
  - vLLM version matrix (0.8.4 - 0.11.x)
  - SGLang version support
  - Known issues including PyTorch 2.8.0 undefined symbol error
  - GPU architecture support table
  - Container/Kubernetes compatibility notes (ovg-project#87)

- Add docs/ARCHITECTURE.md for system architecture (ovg-project#117)
  - System overview diagram
  - Engine decoupling explanation
  - IPC mechanism details
  - Memory lifecycle documentation
  - Configuration guidance for decoupled operation

- Add examples/09_semantic_router/ for content-based routing (ovg-project#91)
  - FastAPI-based semantic router
  - Sentence-transformers integration for query classification
  - Fallback keyword matching
  - Statistics and monitoring endpoints

- Improve error messages in ElasticBlockPool (ovg-project#197)
  - Show available vs requested blocks
  - Display current usage percentage
  - Provide actionable suggestions

Closes ovg-project#48, ovg-project#87, ovg-project#91, ovg-project#117, ovg-project#197, ovg-project#222

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment