Skip to content

migrations: add migrations for prefer-closest-numa-nodes and max-allowable-numa-nodes#4778

Draft
piyush-jena wants to merge 3 commits intobottlerocket-os:developfrom
piyush-jena:add-k8s-settings
Draft

migrations: add migrations for prefer-closest-numa-nodes and max-allowable-numa-nodes#4778
piyush-jena wants to merge 3 commits intobottlerocket-os:developfrom
piyush-jena:add-k8s-settings

Conversation

@piyush-jena
Copy link
Copy Markdown
Contributor

@piyush-jena piyush-jena commented Mar 3, 2026

Issue number:

Closes #4750

Related to:

Description of changes:
Add 2 topology manager policy options:

  1. max-allowable-numa-nodes - GA k8s-1.35+
  2. prefer-closest-numa-nodes - GA k8s-1.32+

Testing done:
Migration testing:

  1. Before upgrade
[ssm-user@control]$ apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "36151b8b",
    "pretty_name": "Bottlerocket OS 1.56.0 (aws-k8s-1.35)",
    "variant_id": "aws-k8s-1.35",
    "version_id": "1.56.0"
  }
}
[ssm-user@control]$ apiclient set \
  kubernetes.cpu-manager-policy=static \
  kubernetes.topology-manager-policy="best-effort" \
  kubernetes.topology-manager-policy-options.prefer-closest-numa-nodes="true"
Failed to change settings: Failed PATCH request to '/settings/keypair?tx=apiclient-set-KXBvywfwgeVYcZdS': Status 400 when PATCHing /settings/keypair?tx=apiclient-set-KXBvywfwgeVYcZdS: Unable to match your input to the data model.  We may not have enough type information.  Please try the --json input form.  Cause: Error during deserialization: unknown field `topology-manager-policy-options`, expected one of `cluster-name`, `cluster-certificate`, `api-server`, `node-labels`, `node-taints`, `static-pods`, `authentication-mode`, `bootstrap-token`, `standalone-mode`, `eviction-hard`, `eviction-soft`, `eviction-soft-grace-period`, `eviction-max-pod-grace-period`, `kube-reserved`, `system-reserved`, `allowed-unsafe-sysctls`, `server-tls-bootstrap`, `cloud-provider`, `registry-qps`, `registry-burst`, `event-qps`, `event-burst`, `kube-api-qps`, `kube-api-burst`, `container-log-max-size`, `container-log-max-files`, `container-log-max-workers`, `container-log-monitor-interval`, `cpu-cfs-quota-enforced`, `cpu-manager-policy`, `cpu-manager-reconcile-period`, `cpu-manager-policy-options`, `topology-manager-scope`, `topology-manager-policy`, `pod-pids-limit`, `image-gc-high-threshold-percent`, `image-gc-low-threshold-percent`, `image-minimum-gc-age`, `image-maximum-gc-age`, `provider-id`, `log-level`, `credential-providers`, `server-certificate`, `server-key`, `shutdown-grace-period`, `shutdown-grace-period-for-critical-pods`, `memory-manager-reserved-memory`, `memory-manager-policy`, `reserved-cpus`, `memory-swap-behavior`, `hostname-override-source`, `seccomp-default`, `device-ownership-from-security-context`, `single-process-oom-kill`, `static-pods-enabled`, `max-pods`, `cluster-dns-ip`, `cluster-domain`, `node-ip`, `pod-infra-container-image`, `hostname-override`, `ids-per-pod`, `max-parallel-image-pulls` at line 1 column 118

bash-5.2# updog check-update -a --json
[
  {
    "variant": "aws-k8s-1.35",
    "arch": "x86_64",
    "version": "1.57.0",
    "max_version": "1.57.0",
    "waves": {
      "0": "2026-03-09T23:16:35.592575499Z",
      "20": "2026-03-10T02:16:35.592575499Z",
      "102": "2026-03-10T22:16:35.592575499Z",
      "307": "2026-03-11T22:16:35.592575499Z",
      "819": "2026-03-13T22:16:35.592575499Z",
      "1228": "2026-03-14T22:16:35.592575499Z",
      "1843": "2026-03-15T22:16:35.592575499Z"
    },
    "images": {
      "boot": "bottlerocket-aws-k8s-1.35-x86_64-1.57.0-54e01036-boot.ext4.lz4",
      "root": "bottlerocket-aws-k8s-1.35-x86_64-1.57.0-54e01036-root.ext4.lz4",
      "hash": "bottlerocket-aws-k8s-1.35-x86_64-1.57.0-54e01036-root.verity.lz4"
    }
  }
]
  1. After upgrading to v1.57.0
[ssm-user@control]$ apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "54e01036",
    "pretty_name": "Bottlerocket OS 1.57.0 (aws-k8s-1.35)",
    "variant_id": "aws-k8s-1.35",
    "version_id": "1.57.0"
  }
}
[ssm-user@control]$ apiclient set \
  kubernetes.cpu-manager-policy=static \
  kubernetes.topology-manager-policy="best-effort" \
  kubernetes.topology-manager-policy-options.prefer-closest-numa-nodes="true"
[ssm-user@control]$ apiclient get settings.kubernetes
{
  "settings": {
    "kubernetes": {
      "authentication-mode": "aws",
      "cloud-provider": "external",
      "cluster-dns-ip": "10.100.0.10",
      "cluster-domain": "cluster.local",
      "cpu-manager-policy": "static",
      "credential-providers": {
        "ecr-credential-provider": {
          "cache-duration": "12h",
          "enabled": true,
          "image-patterns": [
            "*.dkr.ecr.*.amazonaws.com",
            "*.dkr.ecr.*.amazonaws.com.cn",
            "*.dkr.ecr.*.amazonaws.eu",
            "*.dkr-ecr.*.on.aws",
            "*.dkr-ecr.*.on.amazonwebservices.com.cn",
            "*.dkr.ecr-fips.*.amazonaws.com",
            "*.dkr.ecr-fips.*.amazonaws.eu",
            "*.dkr.ecr.*.cloud.adc-e.uk",
            "*.dkr.ecr-fips.*.cloud.adc-e.uk",
            "*.dkr.ecr.*.c2s.ic.gov",
            "*.dkr.ecr-fips.*.c2s.ic.gov",
            "*.dkr.ecr.*.sc2s.sgov.gov",
            "*.dkr.ecr-fips.*.sc2s.sgov.gov",
            "*.dkr.ecr.*.csp.hci.ic.gov",
            "*.dkr.ecr-fips.*.csp.hci.ic.gov",
            "public.ecr.aws"
          ]
        }
      },
      "device-ownership-from-security-context": true,
      "hostname-override": "ip-172-31-10-220.us-west-2.compute.internal",
      "hostname-override-source": "private-dns-name",
      "max-pods": 29,
      "node-ip": "172.31.10.220",
      "provider-id": "aws:///us-west-2c/i-0fce061d5c684b9a8",
      "seccomp-default": true,
      "server-tls-bootstrap": true,
      "shutdown-grace-period": "150s",
      "shutdown-grace-period-for-critical-pods": "30s",
      "standalone-mode": false,
      "topology-manager-policy": "best-effort",
      "topology-manager-policy-options": {
        "prefer-closest-numa-nodes": true
      }
    }
  }
}

bash-5.2# cat /etc/kubernetes/kubelet/config
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 0.0.0.0
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: "/etc/kubernetes/pki/ca.crt"
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
clusterDomain: cluster.local
clusterDNS:
- 10.100.0.10
kubeReserved:
  cpu: "70m"
  memory: "574Mi"
  ephemeral-storage: "1Gi"
kubeReservedCgroup: "/runtime"
cpuCFSQuota: true
cpuManagerPolicy: static
topologyManagerPolicy: best-effort
topologyManagerPolicyOptions:
  prefer-closest-numa-nodes: "true"
podPidsLimit: 1048576
providerID: aws:///us-west-2c/i-0fce061d5c684b9a8
resolvConf: "/run/netdog/resolv.conf"
hairpinMode: hairpin-veth
readOnlyPort: 0
cgroupDriver: systemd
cgroupRoot: "/"
runtimeRequestTimeout: 15m
protectKernelDefaults: true
serializeImagePulls: false
seccompDefault: true
serverTLSBootstrap: true
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
volumePluginDir: "/var/lib/kubelet/plugins/volume/exec"
maxPods: 29
staticPodPath: "/etc/kubernetes/static-pods/"
shutdownGracePeriod: 150s
shutdownGracePeriodCriticalPods: 30s
failSwapOn: false
failCgroupV1: false
featureGates:
  DynamicResourceAllocation: true
  MutableCSINodeAllocatableCount: true
  1. After downgrading back to v1.56.0
[ssm-user@control]$ apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "36151b8b",
    "pretty_name": "Bottlerocket OS 1.56.0 (aws-k8s-1.35)",
    "variant_id": "aws-k8s-1.35",
    "version_id": "1.56.0"
  }
}
[ssm-user@control]$ apiclient get settings.kubernetes
{
  "settings": {
    "kubernetes": {
      "authentication-mode": "aws",
      "cloud-provider": "external",
      "cluster-dns-ip": "10.100.0.10",
      "cluster-domain": "cluster.local",
      "cpu-manager-policy": "static",
      "credential-providers": {
        "ecr-credential-provider": {
          "cache-duration": "12h",
          "enabled": true,
          "image-patterns": [
            "*.dkr.ecr.*.amazonaws.com",
            "*.dkr.ecr.*.amazonaws.com.cn",
            "*.dkr.ecr.*.amazonaws.eu",
            "*.dkr-ecr.*.on.aws",
            "*.dkr-ecr.*.on.amazonwebservices.com.cn",
            "*.dkr.ecr-fips.*.amazonaws.com",
            "*.dkr.ecr-fips.*.amazonaws.eu",
            "*.dkr.ecr.*.cloud.adc-e.uk",
            "*.dkr.ecr-fips.*.cloud.adc-e.uk",
            "*.dkr.ecr.*.c2s.ic.gov",
            "*.dkr.ecr-fips.*.c2s.ic.gov",
            "*.dkr.ecr.*.sc2s.sgov.gov",
            "*.dkr.ecr-fips.*.sc2s.sgov.gov",
            "*.dkr.ecr.*.csp.hci.ic.gov",
            "*.dkr.ecr-fips.*.csp.hci.ic.gov",
            "public.ecr.aws"
          ]
        }
      },
      "device-ownership-from-security-context": true,
      "hostname-override": "ip-172-31-10-220.us-west-2.compute.internal",
      "hostname-override-source": "private-dns-name",
      "max-pods": 29,
      "node-ip": "172.31.10.220",
      "provider-id": "aws:///us-west-2c/i-0fce061d5c684b9a8",
      "seccomp-default": true,
      "server-tls-bootstrap": true,
      "shutdown-grace-period": "150s",
      "shutdown-grace-period-for-critical-pods": "30s",
      "standalone-mode": false,
      "topology-manager-policy": "best-effort"
    }
  }
}

bash-5.2# cat /etc/kubernetes/kubelet/config
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 0.0.0.0
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: "/etc/kubernetes/pki/ca.crt"
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
clusterDomain: cluster.local
clusterDNS:
- 10.100.0.10
kubeReserved:
  cpu: "70m"
  memory: "574Mi"
  ephemeral-storage: "1Gi"
kubeReservedCgroup: "/runtime"
cpuCFSQuota: true
cpuManagerPolicy: static
topologyManagerPolicy: best-effort
podPidsLimit: 1048576
providerID: aws:///us-west-2c/i-0fce061d5c684b9a8
resolvConf: "/run/netdog/resolv.conf"
hairpinMode: hairpin-veth
readOnlyPort: 0
cgroupDriver: systemd
cgroupRoot: "/"
runtimeRequestTimeout: 15m
protectKernelDefaults: true
serializeImagePulls: false
seccompDefault: true
serverTLSBootstrap: true
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
volumePluginDir: "/var/lib/kubelet/plugins/volume/exec"
maxPods: 29
staticPodPath: "/etc/kubernetes/static-pods/"
shutdownGracePeriod: 150s
shutdownGracePeriodCriticalPods: 30s
failSwapOn: false
failCgroupV1: false
featureGates:
  DynamicResourceAllocation: true
  MutableCSINodeAllocatableCount: true

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Update all five settings-sdk dependency blocks to use the new
v0.22.0 tag which includes topology-manager-policy-options.
Add AddSettingsMigration for:
- settings.kubernetes.topology-manager-policy-options
- settings.kubernetes.topology-manager-policy-options.prefer-closest-numa-nodes
- settings.kubernetes.topology-manager-policy-options.max-allowable-numa-nodes
Signed-off-by: Piyush Jena <jepiyush@amazon.com>
git = "https://github.com/bottlerocket-os/bottlerocket-settings-sdk"
tag = "bottlerocket-settings-models-v0.21.0"
version = "0.14.0"
git = "https://github.com/piyush-jena/bottlerocket-settings-sdk"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to https://github.com/bottlerocket-os/bottlerocket-settings-sdk, not the personal fork. Applies to other places.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the settings sdk is not released yet. For that reason, let's move the PR to draft first. @piyush-jena

source = "public.ecr.aws/bottlerocket/bottlerocket-core-kit:v13.1.0"
digest = "oNFE4+rBh2Js4koxIyxk05hRG9tetYjQc3FQaRWJET8="
version = "13.1.1"
vendor = "piyush"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look like they're from local testing and should be removed.

@piyush-jena piyush-jena marked this pull request as draft March 23, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Bottlerocket support for kubelet "Topology manager policy options"

3 participants