What you would like to be added?
When the KAI scheduler moves a running container from one node to another, I want the option that the original container is not shut down until the new one is up and ready.
Why is this needed?
Inference workloads are often serving requests and can't just be shut off without repercussions. These workloads are never "not" doing work. We need to defragment cluster resources without impacting our users.
What you would like to be added?
When the KAI scheduler moves a running container from one node to another, I want the option that the original container is not shut down until the new one is up and ready.
Why is this needed?
Inference workloads are often serving requests and can't just be shut off without repercussions. These workloads are never "not" doing work. We need to defragment cluster resources without impacting our users.