-
Notifications
You must be signed in to change notification settings - Fork 120
e2e: Add housekeeping IRQ load balancing tests #1456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SargunNarula The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest |
1 similar comment
|
/retest |
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
ff2d905 to
bdafc69
Compare
|
|
||
| By("Verifying housekeeping CPUs equal the container's assigned CPU") | ||
| Expect(housekeepingCPUSet.Equals(containerCpuSet)).To(BeTrue(), | ||
| "Housekeeping CPUs %v should match container CPUs %v for single hyperthread allocation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this will pass, but only because the container itself has just one cpu available. The same should actually happen with 2 cpus when HT is enabled.
The edge case test is fine, but the description and logs do not accurately represent what is tested here. full-pcpus-only is actually not important for the logic, it just allows you to allocate a single cpu only when HT is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this, I have updated the descriptions and logs to reflect hyperthread allocation rather than full-pcpus-only. Let me know if you think any other changes are needed.
| }) | ||
| }) | ||
|
|
||
| Context("Verify TuneD restart preserves IRQ configuration for multiple housekeeping pods", Label(string(label.Tier2)), func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have a similar test for non-housekeeping case, right? Would it make sense to coalesce the code into a single test with a parameter? To avoid code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for insight, yes we already have a very similar test case. Refactored the test as Describe Table in latest commit.
| smtActive := strings.TrimSpace(testutils.ToString(smtOutput)) | ||
| Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)") | ||
|
|
||
| By("Creating a housekeeping pod with 1 CPU") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use more cpus, otherwise the test never uncovers anything even if the logic breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Indeed. Updated the test.
bdafc69 to
f663ea0
Compare
| Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)") | ||
|
|
||
| cpuRequest := 2 | ||
| if cpuRequest > newIsolatedCPUs.Size() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will most likely fail to schedule if you have exactly 2 isolated cpus. There will be some burstable pods as well. You need to check the currently available resources or "assume" some burstable and compare with newIsolatedCPUs - 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a similar helper as there is in NROP:
https://github.com/openshift-kni/numaresources-operator/blob/main/internal/baseload/baseload.go#L37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
|
|
||
| if currentReserved != originalReserved { | ||
| currentProfile.Spec.CPU = originalProfile.Spec.CPU | ||
| profiles.UpdateWithRetry(currentProfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here regarding the profile update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since removing nosmt needs 2 step profile update, removing kernel arg initially and then updating cpusets, it has been resolved by introducing a context level profile flag at initial update itself - here
| }) | ||
| }) | ||
|
|
||
| Context("Verify housekeeping CPU assignment when SMT is disabled", Label(string(label.Tier2)), func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
profile update comment apply to this context as well. there could be 2 way I think of ATM to do the profile update in AfetrEach:
- first update is in beforeEach
- first update is in the test spec but with a flag to indicate that an update is done to the profile and it needs to be reverted.
| Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)") | ||
|
|
||
| cpuRequest := 2 | ||
| if cpuRequest > newIsolatedCPUs.Size() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
f663ea0 to
0ce6e8c
Compare
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
test/e2e/performanceprofile/functests/1_performance/irqbalance.go
Outdated
Show resolved
Hide resolved
| By("Restoring original CPU configuration") | ||
| currentProfile, err = profiles.GetByNodeLabels(testutils.NodeSelectorLabels) | ||
| Expect(err).ToNot(HaveOccurred()) | ||
| currentReserved := string(*currentProfile.Spec.CPU.Reserved) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why updating PP twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While restoring, we update Profile twice to handle the CPU topology transition safely.
We cannot update PP in a single phase because removing nosmt changes the online CPU
topology only after reboot, while kubelet immediately validates reservedSystemCPUs/isolatedCPUs
against the current online CPU set during CPU Manager initialization. A single update would
apply cpusets that reference offline SMT siblings, causing kubelet config validation to fail.
See kubelet CPU manager validation:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L251-L270
| // CPURequestedCores returns the total CPU requested in whole cores (rounded up) | ||
| func (l Load) CPURequestedCores() int { | ||
| millis := l.Resources.Cpu().MilliValue() | ||
| return int((millis + 999) / 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps use a roundup:
retCpu := *resource.NewQuantity(roundUp(cpu.Value(), 2), resource.DecimalSI)
Added e2e tests for IRQ load balancing with housekeeping pods: - [86346] Verify housekeeping works correctly with single hyperthread allocation - [86348] Verify irqbalance does not overwrite on TuneD restart (housekeeping annotation) - [86347] Verify housekeeping selects single CPU when SMT is disabled Added baseload calculation functions for determining available pod capacity on nodes Signed-off-by: Sargun Narula <snarula@redhat.com>
0ce6e8c to
bf2f688
Compare
|
@SargunNarula: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Added e2e tests for IRQ load balancing with housekeeping pods:
(housekeeping annotation)