Skip to content

Latest commit

 

History

History
378 lines (261 loc) · 10.7 KB

File metadata and controls

378 lines (261 loc) · 10.7 KB

Automatic Updates

Overview

The auto_update role manages automatic system updates via a maintenance chain: a single cron job that runs all steps sequentially.

03:00 Maintenance Chain (daily, completes by ~03:30)
  ├── Phase 1: ansible-pull (sync config from Git)
  ├── Pre-Check: updates available?
  │     NO → exit (nothing to do)
  │     YES ↓
  ├── Status Monitor: open maintenance window (optional)
  ├── Phase 2: install updates
  ├── Cleanup: autoremove unused packages + old kernel removal
  ├── Phase 3: conditional reboot (weekday check + health checks)
  │     ├── Reboot → maintenance window closed after boot
  │     └── No reboot → maintenance window closed immediately
  └── Done

No race conditions, no lock files, guaranteed sequential execution.

Configuration

Global Settings

Edit inventory/production/group_vars/all/update_settings.yml:

# Update type: "all" or "security"
auto_update_type: "all"

# Maintenance chain schedule (cron format): Daily at 03:00
auto_update_cron_minute: "0"
auto_update_cron_hour: "3"
auto_update_cron_day: "*"
auto_update_cron_month: "*"
auto_update_cron_weekday: "*"

# Reboot: only on Sunday (at the end of the chain, not a separate cron)
auto_update_reboot_enabled: true
auto_update_reboot_cron_weekday: "0"

# Include ansible-pull as Phase 1 of the chain
auto_update_chain_ansible_pull: true

Per-Host Overrides

Create/edit inventory/production/host_vars/<hostname>.yml:

# Only security updates for this host
auto_update_type: "security"

# Reboot on Saturday instead of Sunday
auto_update_reboot_cron_weekday: "6"

Update Types

All updates (default):

auto_update_type: "all"

Security updates only:

auto_update_type: "security"

On Debian/Ubuntu:

  • all: runs apt-get upgrade + apt-get dist-upgrade
  • security: uses unattended-upgrade (only -security repository)

On RedHat/Rocky/Alma:

  • all: runs dnf update
  • security: runs dnf update --security

Existing configuration files are always preserved (--force-confold).

Maintenance Chain

Flow

  1. Cron fires at scheduled time (default: daily 03:00)
  2. Phase 1: ansible-pull syncs latest config from Git (optional)
  3. Package lists are updated (apt-get update / dnf check-update)
  4. If no updates available → exit early (no maintenance window opened)
  5. If updates available → open status monitor maintenance window (optional)
  6. Phase 2: Updates are installed
  7. Unused dependencies are removed (apt-get autoremove / dnf autoremove)
  8. Old kernels are cleaned up (keep current + one previous)
  9. Phase 3: If reboot is needed AND today is the configured reboot day:
    • Pre-reboot health checks (dpkg/rpm audit, lock check, system load)
    • If all checks pass → shutdown -r +1
    • Maintenance window closed automatically after boot
  10. If no reboot → maintenance window closed immediately

Pre-Reboot Health Checks

Before any reboot, the script verifies:

Check Debian/Ubuntu RedHat
Package manager consistency dpkg --audit rpm lock check
No ongoing package operations fuser dpkg/lock-frontend fuser rpm/.rpm.lock
System load not critical loadavg < 3 * nproc loadavg < 3 * nproc

If any check fails, the reboot is aborted and logged.

Disabled OS Timers

The maintenance chain handles all update operations. To prevent conflicts (double updates, race conditions), the role disables the native OS update timers:

Debian/Ubuntu:

  • apt-daily.timer — stopped and disabled (chain runs apt-get update)
  • apt-daily-upgrade.timer — stopped and disabled (chain runs upgrades)
  • APT::Periodic::* values set to "0" in /etc/apt/apt.conf.d/20auto-upgrades

RedHat/Rocky/Alma:

  • dnf-automatic.timer — stopped and disabled
  • dnf-automatic-install.timer — stopped and disabled

This ensures all updates happen exclusively through the maintenance chain at the configured time.

needrestart (Debian/Ubuntu)

The needrestart package is installed and configured for non-interactive mode to prevent interactive prompts during automated updates:

/etc/needrestart/conf.d/iac-auto-restart.conf:
  $nrconf{restart} = 'a';

Services that need restarting after library updates are restarted automatically.

Files Deployed

All distributions:

File Purpose
/usr/local/sbin/auto-update.sh Maintenance chain script
/var/log/auto-update.log Maintenance chain log

Debian/Ubuntu:

File Purpose
/etc/apt/apt.conf.d/50unattended-upgrades Unattended-upgrades config
/etc/apt/apt.conf.d/20auto-upgrades APT periodic config (all "0")
/etc/needrestart/conf.d/iac-auto-restart.conf Auto-restart services after updates

RedHat/Rocky/Alma:

No additional config files — dnf-automatic and dnf-utils packages are installed, but their timers are disabled (chain handles everything).

Cron Job

# Maintenance chain: daily at 03:00
0 3 * * * /usr/local/sbin/auto-update.sh >> /var/log/auto-update.log 2>&1

Status Monitor Integration

Automatic maintenance window management for status monitoring tools like Uptime Kuma. Opens a window before updates and closes it after, preventing false alerts.

Architecture

/etc/iac-ansible/
  status-monitor.env              ← Credentials (mode 0600, NOT in Git)
  status-monitor.env.example      ← Template with instructions

/usr/local/sbin/
  iac-status-monitor.sh           ← Generic wrapper (provider-agnostic)

/usr/local/lib/iac-ansible/
  providers/
    uptime-kuma.py                ← Uptime Kuma provider (replaceable)

/opt/iac-ansible/
  statusmon-venv/                 ← Isolated Python venv for provider deps

/var/lib/iac-ansible/
  maintenance-state               ← Persistent state (survives reboot)

Setup

Step 1: Enable in Ansible

In inventory/production/group_vars/all/update_settings.yml:

auto_update_statusmon_enabled: true
auto_update_statusmon_provider: "uptime-kuma"

Step 2: Configure Credentials

Option A: Via Ansible Vault (recommended for fleet management)

Set all status-monitor values in inventory/production/group_vars/all/secrets.yml (encrypted via make vault-edit):

secrets_statusmon_url: "https://status.bauer-group.com"
secrets_statusmon_username: "iac-ansible"
secrets_statusmon_password: "your-password"

auto_update_statusmon_* automatically pulls from secrets_statusmon_* via the secrets role (Phase 0). The .env file is deployed to each host.

Per-value inline encryption is also supported:

ansible-vault encrypt_string 'your-password' --name 'secrets_statusmon_password'
Option B: Manual .env per host

Leave auto_update_statusmon_url, auto_update_statusmon_username, and auto_update_statusmon_password empty. Then on each host:

cp /etc/iac-ansible/status-monitor.env.example /etc/iac-ansible/status-monitor.env
vim /etc/iac-ansible/status-monitor.env    # set URL, username, password
chmod 600 /etc/iac-ansible/status-monitor.env

Step 3: Authentication

Uptime Kuma uses a single admin account. The Socket.IO API requires these admin credentials (username/password) for maintenance window management.

Use the Uptime Kuma admin credentials in the .env file. The password should always be stored encrypted via Ansible Vault.

Monitor Discovery

Monitors are found automatically by FQDN hostname. No manual ID mapping needed.

The provider searches all monitors in Uptime Kuma and matches against:

  • Monitor name (e.g., 0046-20.cloud.bauer-group.com HTTP)
  • Monitor URL (e.g., https://0046-20.cloud.bauer-group.com)
  • Monitor hostname field (e.g., 0046-20.cloud.bauer-group.com)

Both the FQDN and short hostname are checked. If no matching monitor is found, the maintenance window step is silently skipped.

To override the hostname used for discovery, add to the .env file:

STATUS_MONITOR_HOSTNAME="custom-name.example.com"

Graceful Behavior

Situation Behavior
No .env file Skipped silently
URL or credentials empty Skipped silently
Status monitor unreachable Warning logged, updates proceed
Server not found in monitors Skipped silently
No provider script found Skipped silently
No updates available No maintenance window opened

Updates are never blocked by status monitor issues.

Post-Reboot

If a reboot is initiated, the maintenance window stays open. A systemd oneshot service (iac-maintenance-close.service) runs after boot and closes the window automatically.

# Check if post-reboot service is enabled
systemctl is-enabled iac-maintenance-close.service

# Check if a maintenance window is still open
cat /var/lib/iac-ansible/maintenance-state

# Manually close a stuck maintenance window
/usr/local/sbin/iac-status-monitor.sh stop

Adding a New Provider

  1. Create a Python script: roles/auto_update/files/providers/my-provider.py
  2. The script must:
    • Read STATUS_MONITOR_URL, STATUS_MONITOR_USERNAME, STATUS_MONITOR_PASSWORD, STATUS_MONITOR_HOSTNAME, STATUS_MONITOR_STATE_FILE from environment
    • Accept start or stop as first argument
    • Write the maintenance ID to STATUS_MONITOR_STATE_FILE on start
    • Delete STATUS_MONITOR_STATE_FILE on stop
    • Exit 0 on success, non-zero on failure
  3. Set the provider: auto_update_statusmon_provider: "my-provider"
  4. If the provider needs Python packages, add an install task to roles/auto_update/tasks/main.yml.

Schedule Examples

Change Update to Weekly (Saturday 01:00)

auto_update_cron_minute: "0"
auto_update_cron_hour: "1"
auto_update_cron_weekday: "6"

Reboot Every Day (if needed)

auto_update_reboot_cron_weekday: "*"

Disable Automatic Reboot

auto_update_reboot_enabled: false

Disable ansible-pull in Chain

auto_update_chain_ansible_pull: false

Package Blacklisting

Exclude specific packages from updates:

auto_update_blacklist:
  - docker-ce
  - docker-ce-cli
  - kubelet
  - kubeadm

Monitoring

Check maintenance chain logs:

tail -f /var/log/auto-update.log

Check if reboot is pending:

# Debian/Ubuntu
test -f /var/run/reboot-required && echo 'Reboot needed' || echo 'OK'

# RedHat
needs-restarting -r

Check cron jobs:

crontab -l | grep IAC

Check status monitor state:

# Is a maintenance window currently open?
cat /var/lib/iac-ansible/maintenance-state 2>/dev/null || echo "No active window"

# Manually test the status monitor integration
/usr/local/sbin/iac-status-monitor.sh start
/usr/local/sbin/iac-status-monitor.sh stop