Skip to content

enhance flow for custom machineconfigs for specific machinesets #1619

@cgwalters

Description

@cgwalters

The CI team is trying to use new AWS m5d.xlarge instances which have two NVMe disks attached. We crafted a custom RAID partition machineconfig to enable that.

We added this as part of the main worker pool - the MCD will fail to roll out the partitioning on the existing workers, but that's fine because the plan was to "roll" the worker pool. Basically get the new MC in the pool, have new workers come online with that config, then scale down the old workers.

However, there are a few issues here.

First, this whole thing would obviously be a lot better if we had machineset-specific machineconfigs. That would solve a bunch of races and be much more elegant.

What we're seeing right now is that one new m5d node went OutOfDisk=true because it was booted with just a 16G root volume from the old config. That unschedulable node then blocks rollout of further changes.

I think we can unstick ourselves here by deleting that node and getting the MCO to roll out the new config.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions