Coordinate Cyclops with Cluster Autoscaler to prevent node upgrade conflicts by dnorman3 · Pull Request #134 · atlassian-labs/cyclops

dnorman3 · 2026-01-27T03:17:11Z

Cluster Autoscaler Annotation Management - Feature Summary

Adds annotation management to coordinate between Cyclops and Cluster Autoscaler during node cycling, preventing Cluster Autoscaler from removing new nodes before old nodes are drained.

Problem

During node cycling, Cluster Autoscaler can remove new nodes before Cyclops finishes draining old nodes, causing cycling to fail or nodes to be terminated prematurely.

Solution

Adds cluster-autoscaler.kubernetes.io/scale-down-disabled: "true" annotation to new nodes during ScalingUp phase
Removes annotation in both Successful and Healing phase transitions
Marker annotation tracking: Uses cyclops.atlassian.com/annotation-managed to track ownership (distinguishes Cyclops-managed vs pre-existing annotations)
Pre-existing preservation: Preserves annotations set by ASG Launch Templates or other external sources
Opt-out feature: NodeGroup annotation cyclops.atlassian.com/disable-annotation-management: "true" disables management (default: enabled)

Key Changes

pkg/controller/cyclenoderequest/transitioner/util.go:

addScaleDownDisabledAnnotation() - Adds annotation + marker (preserves pre-existing)
cleanupScaleDownDisabledAnnotations() - Removes annotations only if marker present
shouldManageAnnotations() - Checks NodeGroup opt-out annotation
getNodeGroup() - Finds matching NodeGroup resource

pkg/controller/cyclenoderequest/transitioner/transitions.go:

transitionScalingUp() - Adds annotations (if enabled)
transitionSuccessful() / transitionHealing() - Removes annotations (if added)

pkg/controller/cyclenoderequest/transitioner/transitioner.go:

Constants: nodeGroupAnnotationKey, cyclopsManagedAnnotation

Opt-Out Configuration

apiVersion: atlassian.com/v1
kind: NodeGroup
metadata:
  annotations:
    cyclops.atlassian.com/disable-annotation-management: "true"  # Disables management

Why NodeGroup? Persistent configuration (survives CNR deletion), aligned with node lifecycle, prevents conflicts with overlapping CNRs.

Design Decisions

Marker annotation ensures safe cleanup (only removes Cyclops-managed annotations)
Pre-existing annotations preserved for backward compatibility
Best-effort operations (don't block cycling)
Dual cleanup paths (Successful + Healing phases)
Opt-out approach (default enabled)

Verification

# Check nodes with annotations
kubectl get nodes -o json | jq -r '.items[] | select(.metadata.annotations["cluster-autoscaler.kubernetes.io/scale-down-disabled"] == "true") | .metadata.name'

# Check opt-out configuration
kubectl get nodegroup <name> -o jsonpath='{.metadata.annotations.cyclops\.atlassian\.com/disable-annotation-management}'

# Enable opt-out
kubectl annotate nodegroup <name> cyclops.atlassian.com/disable-annotation-management="true"

mwhittington21 · 2026-01-27T03:28:48Z

@dnorman3 I think you also need to add logic to remove this label from a node if we transition to healing. The logic would go somewhere here. The reason for this is, if a CNR fails we need to remove the label or the node will forever be ignored by cluster-autoscaler.

pkg/controller/cyclenoderequest/transitioner/transitions.go

mwhittington21

lgtm

pkg/controller/cyclenoderequest/transitioner/transitions.go

vincentportella · 2026-01-28T22:30:56Z

pkg/controller/cyclenoderequest/transitioner/transitioner.go

+// during the cycling process from being removed by Cluster Autoscaler before
+// the corresponding old nodes are fully terminated.
+// See: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node
+const clusterAutoscalerScaleDownDisabledAnnotation = "cluster-autoscaler.kubernetes.io/scale-down-disabled"


could we generalise this behaviour and give the annotation via the CNR instead?

A few reasons why i didn't

The annotation is tied to a node lifecycle, not a cnr lifecycle. If annotations were the in cnr spec, deleting a cnr before cleanup could leave stale annotations.

Autoscaler reads annotations from a node object, not the cnr . If we put them in cnr spec, we would have to sync them to nodes.

cnrs can affect overlapping nodegroups, having annotations in cnr spec can create race conditions and ownership issues.

If we put them in cnr spec, we would have to sync them to nodes.

This is exactly what I'm suggesting. We are already doing this in the code, I am just proposing that the annotation(s) get defined in the CNR rather than hardcoded.

cnrs can affect overlapping nodegroups

Each CNR is generated by observer for a single nodegroup. If a CNR affects nodes across nodegroups that would be a misconfiguration somewhere.

I don't think we need to be generic here, the annotations are very specific to cluster-autoscaler. If we need another label, we probably want a code change because it will need to be used in specific conditions.

What this comment has caused me to re-evaluate is that we should add a field on the CNR to enable/disable this behaviour and default it to true. That way if any users of the software have an issue with this particular annotation/workflow, they can disable the feature and keep their cluster working.

If we have the annotation defined in the CNR then that achieve's it. It wouldn't be enabled by default because the annotations need to be added for the behaviour to take effect. I'd argue this feature is just about adding annotations to new instances and then removing them later. We are doing it for cluster-autoscaler of course but it can apply more widely.

i added the option to opt out of it in the nodegroup (not cnr) see 67d6d0f#commitcomment-175885640 for reasoning

I like putting it on the nodegroup, it feels like something that should be associated with a base nodegroup config.

pkg/controller/cyclenoderequest/transitioner/util.go

add scale-down-disabled annotation

418ef66

mwhittington21 requested changes Jan 27, 2026

View reviewed changes

pkg/controller/cyclenoderequest/transitioner/transitions.go Outdated Show resolved Hide resolved

dnorman3 added 2 commits January 27, 2026 14:54

scope cleanup to only nodes created during this cycling operation

3c42f73

bump version

7a9b60e

mwhittington21 reviewed Jan 27, 2026

View reviewed changes

pkg/controller/cyclenoderequest/transitioner/transitions.go Outdated Show resolved Hide resolved

dnorman3 added 3 commits January 27, 2026 16:02

rm in healing state only

824bc0a

mv to inside of loop

98215c6

rm cleanupScaleDownDisabledAnnotations

b1e22eb

mwhittington21 previously approved these changes Jan 27, 2026

View reviewed changes

fix linting

c45aecb

dnorman3 dismissed mwhittington21’s stale review via c45aecb January 27, 2026 05:17

mwhittington21 previously approved these changes Jan 27, 2026

View reviewed changes

dnorman3 added 2 commits January 28, 2026 12:43

cleanupScaleDownDisabledAnnotations as a helper func

2939740

add cleanupScaleDownDisabledAnnotations() to successful phase

48a5530

dnorman3 dismissed mwhittington21’s stale review via 48a5530 January 28, 2026 01:44

dnorman3 added 4 commits January 28, 2026 13:24

add info log

2404b12

rm comment

ade924d

bump to 1.10.3

086f1d3

move cleanup outsiode of loop

dae91d7

dnorman3 commented Jan 28, 2026

View reviewed changes

pkg/controller/cyclenoderequest/transitioner/transitions.go Outdated Show resolved Hide resolved

dnorman3 marked this pull request as ready for review January 28, 2026 05:58

vincentportella reviewed Jan 28, 2026

View reviewed changes

enable opt-out of feature

67d6d0f

mwhittington21 reviewed Jan 29, 2026

View reviewed changes

pkg/controller/cyclenoderequest/transitioner/util.go Show resolved Hide resolved

dnorman3 added 2 commits January 30, 2026 08:32

add tests

9774268

add preservation mode

98b3882

mwhittington21 approved these changes Feb 2, 2026

View reviewed changes

Jacobious52 approved these changes Feb 5, 2026

View reviewed changes

Merge branch 'master' into dnorman3/scale-down-disabled-annotation

f944612

dnorman3 merged commit a2fcd45 into master Feb 5, 2026
6 checks passed

dnorman3 deleted the dnorman3/scale-down-disabled-annotation branch February 5, 2026 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coordinate Cyclops with Cluster Autoscaler to prevent node upgrade conflicts#134

Coordinate Cyclops with Cluster Autoscaler to prevent node upgrade conflicts#134
dnorman3 merged 17 commits intomasterfrom
dnorman3/scale-down-disabled-annotation

dnorman3 commented Jan 27, 2026 •

edited

Loading

Uh oh!

mwhittington21 commented Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

mwhittington21 left a comment

Uh oh!

Uh oh!

vincentportella Jan 28, 2026

Uh oh!

dnorman3 Jan 28, 2026

Uh oh!

vincentportella Jan 28, 2026

Uh oh!

mwhittington21 Jan 28, 2026

Uh oh!

mwhittington21 Jan 28, 2026

Uh oh!

vincentportella Jan 28, 2026

Uh oh!

dnorman3 Jan 29, 2026 •

edited

Loading

Uh oh!

mwhittington21 Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dnorman3 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cluster Autoscaler Annotation Management - Feature Summary

Problem

Solution

Key Changes

Opt-Out Configuration

Design Decisions

Verification

Uh oh!

mwhittington21 commented Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

mwhittington21 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vincentportella Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

dnorman3 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

vincentportella Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

mwhittington21 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

mwhittington21 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

vincentportella Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

dnorman3 Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwhittington21 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dnorman3 commented Jan 27, 2026 •

edited

Loading

dnorman3 Jan 29, 2026 •

edited

Loading