High-performance, microservice-based AI inference server with Unity integration support.
- 🎯 Unity Ready: Seamless integration with Unity!
- 📈 Scalable: Redis queue-based worker architecture
- 🐳 Easy Deploy: Docker Compose setup for inference setup, api wrapper, nginx & monitoring
- 📊 Monitoring: Grafana template for System, GPU & Application observability
-
🚀 RunPod Quickstart - Get started quickly with Runpod! New to RunPod? Use my refferal link to get a 5$ bonus and support the project!
-
🚀 Self Hosted Quickstart - Get started with your own infrastructure!
Built specifically for Unity developers:
📖 Complete Documentation - Full guides, API reference, and examples
- 🏗️ Model Templates - Model stack templates
- 🚢 Deployment Strategies - Both distributed and standalone
- 🛠️ Components - Individual service configuration
- 🎮 Discord - For Support & Discussion
Distributed microservice design for maximum flexibility:
┌─────────────┐ ┌─────────┐ ┌─────────────┐
│ API │────│ Redis │────│ GPU Workers │
│ (FastAPI) │ │ Queue │ │ (LLM + SD) │
└─────────────┘ └─────────┘ └─────────────┘
│ │
│ ┌─────────────┐ │
└───────────│ Monitoring │────────┘
│(Grafana+Prom)│
└─────────────┘
- API Service: FastAPI with token auth and job queuing
- GPU Workers: Custom llama.cpp + Stable Diffusion inference engines
- Redis Queue: Decoupled job processing for scalability
- Monitoring: Pre-configured Grafana dashboards
📖 Learn more: Architecture Documentation
Automatic1111 SD Web server LLAMACPP
Questions? Check the Documentation or open an issue!