Summary
Support L3 cache-aware CPU allocation in CPU DRA.
Users should be able to request an entire L3 cache domain as a resource. When such a request is allocated, all CPUs that belong to that L3 cache should be treated as isolated for that workload.
Motivation
For latency-sensitive and packet-processing workloads, CPU isolation alone is not always sufficient. Two workloads can still interfere with each other if they share the same L3 cache, even when they do not share the exact same CPUs.
We need a way to express cache-domain exclusivity, not just CPU-count exclusivity.
Requested behavior
Two related behaviors are needed.
1. Full L3 cache request
A user can request one full L3 cache domain.
When that request is allocated:
- the allocation returns the CPUs that belong to that L3 cache domain
- all CPUs in that domain are treated as isolated for that workload
- other workloads cannot allocate CPUs from that same L3 cache domain
2. Partial CPU request blocks future full-cache allocation
If a workload allocates even a single CPU from a given L3 cache domain, that L3 cache domain should no longer be eligible for a later "full L3 cache" allocation for another workload.
The inverse should also be true:
- once a full L3 cache domain is allocated, later per-CPU allocations from that domain must be blocked for other workloads
This is needed to avoid placing unrelated applications on CPUs that would still share the same L3 cache and impact latency / determinism.
Why the new shared consumable capacity KEP looks relevant
KEP-5075: DRA Consumable Capacity looks like a strong planning reference for this feature.
It seems like a good fit for modeling each L3 cache domain as a DRA allocation domain with shared capacity:
- a
full L3 request can consume the entire capacity of that cache domain
- an individual CPU allocation can consume part of the same domain capacity
- once part of the domain is consumed, a later request for the full domain becomes unschedulable
- once the full domain is consumed, later per-CPU allocations from that domain are also blocked
That is very close to the behavior we want.
Acceptance criteria
- a workload can request a full L3 cache domain as a resource
- the allocation result exposes the selected L3 cache domain and the CPUs that belong to it
- all CPUs in that L3 cache domain are treated as isolated for that workload
- if any CPU in an L3 domain is already allocated, that domain is excluded from future full-domain claims
- if a full L3 domain is allocated, future per-CPU allocations from that domain are excluded
- the behavior is clearly defined across NUMA nodes, sockets, and SMT topologies
Summary
Support L3 cache-aware CPU allocation in CPU DRA.
Users should be able to request an entire L3 cache domain as a resource. When such a request is allocated, all CPUs that belong to that L3 cache should be treated as isolated for that workload.
Motivation
For latency-sensitive and packet-processing workloads, CPU isolation alone is not always sufficient. Two workloads can still interfere with each other if they share the same L3 cache, even when they do not share the exact same CPUs.
We need a way to express cache-domain exclusivity, not just CPU-count exclusivity.
Requested behavior
Two related behaviors are needed.
1. Full L3 cache request
A user can request one full L3 cache domain.
When that request is allocated:
2. Partial CPU request blocks future full-cache allocation
If a workload allocates even a single CPU from a given L3 cache domain, that L3 cache domain should no longer be eligible for a later "full L3 cache" allocation for another workload.
The inverse should also be true:
This is needed to avoid placing unrelated applications on CPUs that would still share the same L3 cache and impact latency / determinism.
Why the new shared consumable capacity KEP looks relevant
KEP-5075: DRA Consumable Capacitylooks like a strong planning reference for this feature.It seems like a good fit for modeling each L3 cache domain as a DRA allocation domain with shared capacity:
full L3request can consume the entire capacity of that cache domainThat is very close to the behavior we want.
Acceptance criteria