Skip to content

Conversation

@SargunNarula
Copy link
Contributor

@SargunNarula SargunNarula commented Jan 15, 2026

Added e2e tests for IRQ load balancing with housekeeping pods:

  • [86346] Verify housekeeping works correctly with single hyperthread allocation
  • [86348] Verify irqbalance does not overwrite on TuneD restart
    (housekeeping annotation)
  • [86347] Verify housekeeping selects single CPU when SMT is disabled
  • Added baseload calculation functions for determining available pod capacity on nodes

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SargunNarula
Once this PR has been reviewed and has the lgtm label, please assign jmencak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from jmencak and yanirq January 15, 2026 13:25
@SargunNarula
Copy link
Contributor Author

/retest

1 similar comment
@yanirq
Copy link
Contributor

yanirq commented Jan 25, 2026

/retest


By("Verifying housekeeping CPUs equal the container's assigned CPU")
Expect(housekeepingCPUSet.Equals(containerCpuSet)).To(BeTrue(),
"Housekeeping CPUs %v should match container CPUs %v for single hyperthread allocation",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this will pass, but only because the container itself has just one cpu available. The same should actually happen with 2 cpus when HT is enabled.

The edge case test is fine, but the description and logs do not accurately represent what is tested here. full-pcpus-only is actually not important for the logic, it just allows you to allocate a single cpu only when HT is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this, I have updated the descriptions and logs to reflect hyperthread allocation rather than full-pcpus-only. Let me know if you think any other changes are needed.

})
})

Context("Verify TuneD restart preserves IRQ configuration for multiple housekeeping pods", Label(string(label.Tier2)), func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have a similar test for non-housekeeping case, right? Would it make sense to coalesce the code into a single test with a parameter? To avoid code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for insight, yes we already have a very similar test case. Refactored the test as Describe Table in latest commit.

smtActive := strings.TrimSpace(testutils.ToString(smtOutput))
Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)")

By("Creating a housekeeping pod with 1 CPU")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use more cpus, otherwise the test never uncovers anything even if the logic breaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Indeed. Updated the test.

Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)")

cpuRequest := 2
if cpuRequest > newIsolatedCPUs.Size() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will most likely fail to schedule if you have exactly 2 isolated cpus. There will be some burstable pods as well. You need to check the currently available resources or "assume" some burstable and compare with newIsolatedCPUs - 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shajmakh @MarSik Thanks for pointing this, have a look at the baseload helper implementation introduced with latest commit.


if currentReserved != originalReserved {
currentProfile.Spec.CPU = originalProfile.Spec.CPU
profiles.UpdateWithRetry(currentProfile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here regarding the profile update

Copy link
Contributor Author

@SargunNarula SargunNarula Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since removing nosmt needs 2 step profile update, removing kernel arg initially and then updating cpusets, it has been resolved by introducing a context level profile flag at initial update itself - here

})
})

Context("Verify housekeeping CPU assignment when SMT is disabled", Label(string(label.Tier2)), func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profile update comment apply to this context as well. there could be 2 way I think of ATM to do the profile update in AfetrEach:

  1. first update is in beforeEach
  2. first update is in the test spec but with a flag to indicate that an update is done to the profile and it needs to be reverted.

Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)")

cpuRequest := 2
if cpuRequest > newIsolatedCPUs.Size() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines 611 to 614
By("Restoring original CPU configuration")
currentProfile, err = profiles.GetByNodeLabels(testutils.NodeSelectorLabels)
Expect(err).ToNot(HaveOccurred())
currentReserved := string(*currentProfile.Spec.CPU.Reserved)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why updating PP twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While restoring, we update Profile twice to handle the CPU topology transition safely.

We cannot update PP in a single phase because removing nosmt changes the online CPU
topology only after reboot, while kubelet immediately validates reservedSystemCPUs/isolatedCPUs
against the current online CPU set during CPU Manager initialization. A single update would
apply cpusets that reference offline SMT siblings, causing kubelet config validation to fail.

See kubelet CPU manager validation:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L251-L270

// CPURequestedCores returns the total CPU requested in whole cores (rounded up)
func (l Load) CPURequestedCores() int {
millis := l.Resources.Cpu().MilliValue()
return int((millis + 999) / 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps use a roundup:
retCpu := *resource.NewQuantity(roundUp(cpu.Value(), 2), resource.DecimalSI)

Added e2e tests for IRQ load balancing with housekeeping pods:
- [86346] Verify housekeeping works correctly with single hyperthread allocation
- [86348] Verify irqbalance does not overwrite on TuneD restart
  (housekeeping annotation)
- [86347] Verify housekeeping selects single CPU when SMT is disabled

Added baseload calculation functions for determining available pod capacity on nodes

Signed-off-by: Sargun Narula <snarula@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@SargunNarula: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-pao bf2f688 link true /test e2e-hypershift-pao
ci/prow/e2e-gcp-pao bf2f688 link true /test e2e-gcp-pao
ci/prow/e2e-hypershift bf2f688 link true /test e2e-hypershift
ci/prow/e2e-aws-ovn bf2f688 link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants