-
Notifications
You must be signed in to change notification settings - Fork 388
Description
There is a ~10% chance that the primary IPv4 address is removed from the eth0 network interface during the initial boot of a newly provisioned VM.
It is NetworkManager that removes it. In fact it appears to randomly remove (and sometimes add back) any of the IP addresses on eth0 (we have one primary, one secondary, and one site-local IPv6). The changes are always done right after the waagent detects the hostname has been changed to the Azure VM name, i.e. the log contains e.g. "Detected hostname change: pkrvmoq00ytv5qk -> h12077".
This matches the issue referred to in #3008: "When the agent publishes an updated hostname to DNS, it restarts the NM and then restarts the interface configuration manually. This can lead to a race condition...".
The fix for #3008 is for waagent to not restart the NM just before the hostname is published. We have verified that this also fixes our issue.
Unfortunately, #3032 limited the fix of #3008 to RHEL [7, 8.6) factory.py#L119 because the changes beyond 8.6 "have not been stress tested on the distros which use RedhatOSModernUtil, and we have not reproduced the race condition using RedhatOSModernUtil". Now we have been able to reproduce the issue on AlmaLinux 8.10.
Therefore, in order verify the fix worked for us, we had to patch the waagent egg that runs on boot as follows:
--- a/azurelinuxagent/common/osutil/redhat.py 2025-04-09 14:45:11.430807287 +0000
+++ b/azurelinuxagent/common/osutil/redhat.py 2025-04-09 14:46:15.065608134 +0000
@@ -270,5 +270,6 @@
# RedhatOSUtil was updated to conditionally run NetworkManager restart in response to a race condition between
# NetworkManager restart and the agent restarting the network interface during publish_hostname. Keeping the
# NetworkManager restart in RedhatOSModernUtil because the issue was not reproduced on these versions.
- shellutil.run("service NetworkManager restart")
+ logger.warn("patch: not restarting NetworkManager")
+ #shellutil.run("service NetworkManager restart")
DefaultOSUtil.publish_hostname(self, hostname)
Distro and WALinuxAgent details (please complete the following information):
- Distro and Version: AlmaLinux 8.10
- WALinuxAgent version: 2.13.1.1