Skip to content

Password randomization failure blocks ssh key install #50

@andrewhamon

Description

@andrewhamon

If I make an AMI where I create a password, subsequent runs of ec2-macos-init will fail before they ever get a chance to install the default ssh key.

Here is an example run:

2024/08/22 23:47:30.984552 Fetching instance ID from IMDS...
2024/08/22 23:47:30.987089 Running on instance i-0bf7783cd5199dc8d
2024/08/22 23:47:30.987130 Reading init config...
2024/08/22 23:47:30.989063 Successfully read init config
2024/08/22 23:47:30.989097 Validating config...
2024/08/22 23:47:30.989257 Successfully validated config
2024/08/22 23:47:30.989268 Prioritizing modules...
2024/08/22 23:47:30.989290 Successfully prioritized modules
2024/08/22 23:47:30.989299 Creating instance history directories for current instance...
2024/08/22 23:47:30.989585 Successfully created directories
2024/08/22 23:47:30.989598 Getting instance history...
2024/08/22 23:47:30.989782 Successfully gathered instance history
2024/08/22 23:47:30.989793 Processing priority level 1 (2 modules)...
2024/08/22 23:47:30.989819 Running module [UnmountLocalSSD] (type: command, group: 1)
2024/08/22 23:47:30.989834 Running module [DisableEthernet] (type: command, group: 1)
2024/08/22 23:47:31.037840 Successfully completed module [DisableEthernet] (type: command, group: 1) with message: successfully ran command [[/usr/sbin/networksetup -setnetworkserviceenabled Ethernet off]] with stdout [] and stderr []
2024/08/22 23:47:31.697122 Successfully completed module [UnmountLocalSSD] (type: command, group: 1) with message: successfully ran command [[/bin/zsh -c diskutil list internal physical | egrep -o '^/dev/disk\d+' | xargs diskutil eject || true]] with stdout [] and stderr [Volume failed to eject]
2024/08/22 23:47:31.698805 Successfully completed processing of priority level 1
2024/08/22 23:47:31.698834 Processing priority level 2 (1 modules)...
2024/08/22 23:47:31.698893 Running module [CheckNetworkIsUp] (type: networkcheck, group: 2)
2024/08/22 23:47:31.738534 Successfully completed module [CheckNetworkIsUp] (type: networkcheck, group: 2) with message: successfully pinged default gateway with a RTT of 266.667µs
2024/08/22 23:47:31.738626 Successfully completed processing of priority level 2
2024/08/22 23:47:31.738647 Processing priority level 3 (12 modules)...
2024/08/22 23:47:31.738696 Running module [GrowRootAPFSVolume] (type: command, group: 3)
2024/08/22 23:47:31.738839 Running module [NeverSleep] (type: command, group: 3)
2024/08/22 23:47:31.738848 Running module [ManageEC2User] (type: usermanagement, group: 3)
2024/08/22 23:47:31.738884 Running module [UpdateMOTD] (type: motd, group: 3)
2024/08/22 23:47:31.738950 Running module [SetDefaultTimezone] (type: command, group: 3)
2024/08/22 23:47:31.739207 Running module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3)
2024/08/22 23:47:31.739451 Running module [SetAmazonTimeSync] (type: command, group: 3)
2024/08/22 23:47:31.739488 Running module [NeverSleepDisplay] (type: command, group: 3)
2024/08/22 23:47:31.739657 Running module [DisableSleep] (type: command, group: 3)
2024/08/22 23:47:31.739639 Running module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3)
2024/08/22 23:47:31.739851 Running module [RemoveSSHGroup] (type: command, group: 3)
2024/08/22 23:47:31.740006 Running module [DisableWiFi] (type: command, group: 3)
2024/08/22 23:47:31.753394 Error while running module [GrowRootAPFSVolume] (type: command, group: 3) with message:  and err: ec2macosinit: error executing command [[/bin/zsh -c ec2-macos-utils grow --id root]] with stdout [] and stderr [zsh:1: command not found: ec2-macos-utils]: exit status 127
2024/08/22 23:47:31.762102 Did not modify sysctl property [kern.aioprocmax=256]
2024/08/22 23:47:31.762206 Did not modify sysctl property [net.inet.tcp.autorcvbufmax=33554432]
2024/08/22 23:47:31.766793 Did not modify sysctl property [kern.aiomax=900]
2024/08/22 23:47:31.766774 Did not modify sysctl property [net.inet.tcp.win_scale_factor=8]
2024/08/22 23:47:31.767872 Did not modify sysctl property [kern.aiothreads=64]
2024/08/22 23:47:31.768859 Did not modify sysctl property [net.inet.tcp.recvspace=1048576]
2024/08/22 23:47:31.769714 Did not modify sysctl property [net.inet.tcp.autosndbufmax=33554432]
2024/08/22 23:47:31.769886 Did not modify sysctl property [net.inet.tcp.sendspace=1048576]
2024/08/22 23:47:31.774912 Did not modify sysctl property [net.link.generic.system.rcvq_maxlen=1024]
2024/08/22 23:47:31.792731 Did not modify SSHD configuration
2024/08/22 23:47:31.848976 Did not modify default [ConfigDataInstall]
2024/08/22 23:47:31.849036 Did not modify default [AutomaticallyInstallMacOSUpdates]
2024/08/22 23:47:31.849161 Did not modify default [AutomaticDownload]
2024/08/22 23:47:31.849292 Did not modify default [AutomaticCheckEnabled]
2024/08/22 23:47:31.884556 Successfully completed module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3) with message: system configuration completed with [0 changed / 1 unchanged /0 error(s)] out of 1 requested changes
2024/08/22 23:47:31.890440 Successfully completed module [UpdateMOTD] (type: motd, group: 3) with message: successfully updated motd file [/etc/motd] with version string [macOS Sonoma 14.5]
2024/08/22 23:47:31.898016 Did not modify default [CriticalUpdateInstall]
2024/08/22 23:47:31.898050 Successfully completed module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3) with message: system configuration completed with [0 changed / 14 unchanged / 0 error(s)] out of 14 requested changes
2024/08/22 23:47:31.928354 Successfully completed module [SetDefaultTimezone] (type: command, group: 3) with message: successfully ran command [[systemsetup -settimezone GMT]] with stdout [Set TimeZone: GMT] and stderr [2024-08-22 23:47:31.927 systemsetup[10242:88077] ### Error:-99 File:/AppleInternal/Library/BuildRoots/91a344b1-f985-11ee-b563-fe8bc7981bff/Library/Caches/com.apple.xbs/Sources/Admin/InternetServices.m Line:379]
2024/08/22 23:47:31.934127 Successfully completed module [RemoveSSHGroup] (type: command, group: 3) with message: successfully ran command [[/bin/zsh -c dscl /Local/Default delete /Groups/com.apple.access_ssh || true]] with stdout [delete: Invalid Path] and stderr [<dscl_cmd> DS Error: -14009 (eDSUnknownNodeName)]
2024/08/22 23:47:31.961197 Successfully completed module [DisableWiFi] (type: command, group: 3) with message: successfully ran command [[/bin/zsh -c wifidevice="$(networksetup -listallhardwareports |grep -A 1 "Wi-Fi" | tail -n 1 | cut -d " " -f2)"; if [[ ! -z $wifidevice ]]; then networksetup -setairportpower $wifidevice off; fi]] with stdout [] and stderr []
2024/08/22 23:47:31.978330 Successfully completed module [NeverSleepDisplay] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a displaysleep 0]] with stdout [] and stderr[]
2024/08/22 23:47:31.981452 Successfully completed module [DisableSleep] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a disablesleep 1]] with stdout [] and stderr []
2024/08/22 23:47:31.997461 Successfully completed module [NeverSleep] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a sleep 0]] with stdout [] and stderr []
2024/08/22 23:47:32.034087 Successfully completed module [SetAmazonTimeSync] (type: command, group: 3) with message: successfully ran command [[systemsetup -setusingnetworktime on -setnetworktimeserver 169.254.169.123]] with stdout [Network Time is already on.
setNetworkTimeServer: 169.254.169.123] and stderr [2024-08-22 23:47:32.033 systemsetup[10259:88082] ### Error:-99 File:/AppleInternal/Library/BuildRoots/91a344b1-f985-11ee-b563-fe8bc7981bff/Library/Caches/com.apple.xbs/Sources/Admin/InternetServices.m Line:379]
2024/08/22 23:47:32.111164 Error while running module [ManageEC2User] (type: usermanagement, group: 3) with message:  and err: ec2macosinit: failed to randomize password: ec2macosinit: unable to set secure password: ec2macosinit: failed to set ec2-user's password: exit status 67
2024/08/22 23:47:32.111209 Successfully completed processing of priority level 3
2024/08/22 23:47:32.111216 Writing instance history for instance i-0bf7783cd5199dc8d...
2024/08/22 23:47:32.133068 Successfully wrote instance history
2024/08/22 23:47:32.140094 Number of fatal retries (101) exceeded, exiting 0 to avoid infinite runs
2024/08/22 23:47:32.140113 Exiting after 1.152981375s due to failure in module [ManageEC2User] with FatalOnError set

It would be great if this failure was a soft failure, and other modules had a chance to complete.

I think the "bug" here is that a failure to set a password is considered a retry-able error and keeps retrying until the 100 retry limit. Then the program exists. I think its mainly bad luck/race conditions that this usually happens before the default ssh key can be installed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions