Skip to content

compute-domain-daemon: skip SIGUSR1 when DNS mapping update only removes nodes#919

Open
AkshatDudeja77 wants to merge 8 commits intokubernetes-sigs:mainfrom
AkshatDudeja77:fix-skip-sigusr1-on-node-removal
Open

compute-domain-daemon: skip SIGUSR1 when DNS mapping update only removes nodes#919
AkshatDudeja77 wants to merge 8 commits intokubernetes-sigs:mainfrom
AkshatDudeja77:fix-skip-sigusr1-on-node-removal

Conversation

@AkshatDudeja77
Copy link
Copy Markdown

Fixes #913

The IMEXDaemonUpdateLoopWithDNSNames currently sends SIGUSR1 to the IMEX
daemon whenever the DNS/IP mapping changes.

However, when the update only removes nodes (and does not add new ones),
forcing the daemon to re-resolve and reconnect is unnecessary.

This change tracks the previous mapping size and skips sending SIGUSR1 when
the updated mapping strictly removes nodes.

Before:
SIGUSR1 triggered whenever the mapping changed.

After:
SIGUSR1 triggered only when nodes are added or mappings otherwise change,
but not when nodes are only removed.

This implements the TODO in the code suggesting that reconnects can be
skipped when the new IP set only removes addresses compared to the old set.

Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@AkshatDudeja77 AkshatDudeja77 force-pushed the fix-skip-sigusr1-on-node-removal branch from a620c5d to f8ad4c0 Compare March 6, 2026 07:20
@jgehrcke
Copy link
Copy Markdown
Contributor

jgehrcke commented Mar 7, 2026

when the update only removes nodes (and does not add new ones)

That's a great goal. But how do we detect this situation with sets being our central data structures?

Write two sets down, on a piece of paper. This could look like this:

Set A:  item1, item2, item3
Set B:  item1, item2, item4

What do you need to do in order to find out if set A has an item removed compared to set B?

Do you need to look at more than just set size?

I genuinely wonder what your goals and incentives are. This is I think your sixth pull request of this style against this repository, and I believe we need to think about providing candid feedback soon :). These are wild times!

Copy link
Copy Markdown
Contributor

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logically incorrect, as expressed in regular comment

@jgehrcke
Copy link
Copy Markdown
Contributor

jgehrcke commented Mar 7, 2026

For context, I still find it quite likely that this is AI-generated noise. Last time I looked into this, I noted:
#863 (comment)

@AkshatDudeja77
Copy link
Copy Markdown
Author

AkshatDudeja77 commented Mar 8, 2026

@jgehrcke thank you for your point about comparing sets rather than just sizes, I finally do understand what you're saying.

My intention with the current change was only to avoid sending SIGUSR1 when the update strictly removes nodes (i.e. when the mapping shrinks). In cases where
the size stays the same but the elements differ (like old set: A,B,C and new set: A,B,D), we would still send SIGUSR1 because a new peer appeared.

For transparency: I used AI to draft the initial idea here, but I’ve reviewed and adjusted the code and tests myself.

…ves nodes (fixes kubernetes-sigs#913)

Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
@AkshatDudeja77 AkshatDudeja77 force-pushed the fix-skip-sigusr1-on-node-removal branch from f8ad4c0 to a7ddddf Compare March 8, 2026 19:05
@AkshatDudeja77 AkshatDudeja77 requested a review from jgehrcke March 8, 2026 19:18
@rajatchopra
Copy link
Copy Markdown

@AkshatDudeja77 Can you write some unit testcases around it

Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
@AkshatDudeja77 AkshatDudeja77 force-pushed the fix-skip-sigusr1-on-node-removal branch from 13461a6 to e40c7ed Compare March 24, 2026 06:21
@AkshatDudeja77
Copy link
Copy Markdown
Author

@AkshatDudeja77 Can you write some unit testcases around it

@rajatchopra Refactored the SIGUSR1 decision into a helper and added unit tests covering no-change, pure-removal, additions, replacements, remove+add, and fresh-process cases.

This now explicitly checks for newly added peers rather than relying on size or subset assumptions.

Please let me know if there are additional edge cases you'd like covered

Comment thread internal/common/util.go Outdated
Comment thread cmd/compute-domain-daemon/main.go Outdated
…ge, update comment)

Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
Comment thread cmd/compute-domain-daemon/main.go Outdated
Comment thread cmd/compute-domain-daemon/main.go Outdated
Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
Comment thread cmd/compute-domain-daemon/main.go Outdated
// - the mapping was updated,
// - the process is not fresh, and
// - at least one new IP (peer) was added.
func shouldSendSIGUSR1(oldIPs, newIPs IPSet, updated, fresh bool) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need updated as an argument into this function? Is it directly implied by oldIPs and newIPs?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated is handled at the call site now, so i removed it from the helper and aligned the tests accordingly

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring to reflect the current function behavior after removing updated.

Also fixed the IPSet usage at the call site by explicitly constructing the new IP set from the DNS mapping to ensure correct types

Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
@AkshatDudeja77 AkshatDudeja77 requested a review from jgehrcke April 2, 2026 05:56
@k8s-triage-robot
Copy link
Copy Markdown

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid sending SIGUSR1 when ComputeDomain update only removes nodes

6 participants