NUMA-aware GPU provisioning and orchestration for stateless MoE workloads of all sizes - *Claude Code native*
kubernetes terraform moe numa ray multi-cloud gitops mlops mixture-of-experts huggingface-spaces runpod vllm ollama qwen litellm sglang claude-code glm5 gpu-provisioning disaggregated-inference
-
Updated
Mar 3, 2026 - Python