Releases: ctrox/zeropod
v0.11.2
What's Changed
- fix: metrics socket connection leak by @ctrox in #167
- fix: add new annotations to containerd config by @ctrox in #168
- fix: lastActivity when container has empty ports by @ctrox in #169
- fix: do not scale down when execs are running by @ctrox in #170
- chore: update image versions by @ctrox in #171
Full Changelog: v0.11.1...v0.11.2
v0.11.1
v0.11.0
What's Changed
-
added annotations for configuring proxy timeouts
zeropod.ctrox.dev/connect-timeout: "5s" zeropod.ctrox.dev/proxy-timeout: "5s"
-
much improved checkpoint/restore failure handling
- the last 20 lines of the criu log are now visible in the zeropod-node log
- falls back to disabling checkpoint after failure
- events are created on checkpoint/restore error
- metrics have been added to record the error counts
-
local checkpoint images are now deleted on restore, not just on pod deletion
Changelog
- feat: update resource requests earlier on migration by @ctrox in #146
- fix: allow resizing cpu/mem only by @ctrox in #147
- feat: cleanup image on restore by @ctrox in #148
- feat: fail migration of running container early by @ctrox in #149
- fix: unlock CR during evac by @ctrox in #150
- feat: improve checkpoint/restore fail handling by @ctrox in #151
- feat: add checkpoint/restore error metrics by @ctrox in #152
- Add configurable proxy timeout via annotations by @PhoenixSolutionsGroup in #154
- fix: return early if already restored by @ctrox in #155
- fix: do not migrate when checkpointing is disabled by @ctrox in #156
- feat: move runtimeclass install to config by @ctrox in #157
- fix: version information in manager/installer by @ctrox in #158
- fix: ensure eventTime is never empty by @ctrox in #159
- feat: add loop detection to activator by @ctrox in #160
- docs: document timeout annotations by @ctrox in #161
- chore: update dependencies by @ctrox in #162
- chore: update image versions by @ctrox in #163
New Contributors
- @PhoenixSolutionsGroup made their first contribution in #154
Full Changelog: v0.10.0...v0.11.0
v0.10.0
What's Changed
- cleanup of old migration resources: migration resources are now owned by the source/destination pod, resulting in automatic cleanup by the API Server on deletion of a pod. If this is undesired, for example while debugging, the cleanup can be disabled by setting the flag
-auto-gc-migrations=falseon the manager. - cleanup of old checkpoint images: checkpoint images were not being cleaned up in all cases, especially when migration was disabled.
- various fixes and improvements for migration scalability.
Full Changelog
- fix: use net.JoinHostPort for ipv6 compatibility by @ctrox in #132
- fix: skip unix sockets as they can't be archived by @ctrox in #133
- chore: run modernize by @ctrox in #134
- feat: make migration timeouts and reconcile concurrency configurable by @ctrox in #136
- fix: remove image on deletion by @ctrox in #138
- fix: cleanup image of finished migrations by @ctrox in #135
- feat: garbage collect migrations by default by @ctrox in #139
- test: disable pinning by @ctrox in #140
- feat: remove global checkpoint lock by @ctrox in #141
- fix: do not claim previously unclaimed migrations by @ctrox in #142
- feat: adjust checkpoint/restore duration buckets by @ctrox in #144
- chore: update image versions by @ctrox in #143
Full Changelog: v0.9.2...v0.10.0
v0.9.2
v0.9.1
What's Changed
While validating the latest release on different systems, an issue was identified where the socket_tracker map would be emptied by the kernel sporadically. This includes a hotfix for this issue by changing the BPF map type.
Upgrade notes
If you are coming from release v0.8.0 or earlier and make use of scale to zero, ensure to re-schedule all pods using zeropod after updating the zeropod-node DaemonSet. This is due to an architectural change in the socket tracker. Pods running pre-v0.9.0 zeropod shims might not be able to detect the last TCP connection and will simply scale down after the configured duration is up.
Changelog
- fix: use non-LRU map type for socket tracker by @ctrox in #119
- chore: update image versions by @ctrox in #120
Full Changelog: v0.9.0...v0.9.1
v0.9.0
What's Changed
- merged socket tracker into redirector for more reliability in detecting TCP activity.
- fixed stats/metrics reporting of the shim to the container runtime. This fixes use-cases such as combining zeropod with HPA and also will make
kubectl top podreport 0 usage instead of nothing at all. - makes use of TCX to attach eBPF program if available (kernel 6.6+).
- upgraded several dependencies such as Go and criu.
Upgrade notes
If you are coming from an earlier release of zeropod and make use of scale to zero, ensure to re-schedule all pods using zeropod after updating the zeropod-node DaemonSet. This is due to an architectural change in the socket tracker. Pods running pre-v0.9.0 zeropod shims might not be able to detect the last TCP connection and will simply scale down after the configured duration is up.
Changelog
- fix: notify node when starting in scaled-down state by @ctrox in #103
- fix: abort restore when node connection fails by @ctrox in #104
- docs: split up single README.md into multiple docs by @ctrox in #105
- feat: merge socket tracker into redirector by @ctrox in #102
- fix: typos in comments and messages by @ctrox in #106
- docs: adjust point about connection tracking by @ctrox in #107
- fix: stats in scaled down state by @ctrox in #112
- chore: upgrade Go to 1.25 by @ctrox in #113
- feat: use TCX instead of qdisc when available by @ctrox in #116
- chore: upgrade criu to v4.2 by @ctrox in #114
- fix: simplify poll error handling by @ctrox in #117
- chore: update image versions by @ctrox in #118
Full Changelog: v0.8.0...v0.9.0
v0.8.0
What's Changed
-
shim memory usage reduced by ~26% and binary size by ~35%.
# v0.7.0 $ ps -aux RSS COMMAND 18556 /opt/zeropod/bin/containerd-shim-zeropod-v2 $ ls -s /opt/zeropod/bin/containerd-shim-zeropod-v2 14896 /opt/zeropod/bin/containerd-shim-zeropod-v2 # v0.8.0 $ ps -aux RSS COMMAND 13564 /opt/zeropod/bin/containerd-shim-zeropod-v2 $ ls -s /opt/zeropod/bin/containerd-shim-zeropod-v2 9648 /opt/zeropod/bin/containerd-shim-zeropod-v2 -
migration is now claimed immediately by the controller, meaning it should no longer timeout when an image pull takes longer than 10s.
-
CRIU has been upgraded to include fixes when running on kernel 6.16+
-
Fix for using runc 1.3.0+
Full Changelog
- config: enable status events and in-place scaling by default by @ctrox in #79
- feat: remove unused io code by @ctrox in #81
- chore: upgrade to containerd to v2.1.4 by @ctrox in #82
- feat: add version info to manager/shim by @ctrox in #83
- feat: upgrade CRIU by @ctrox in #88
- fix: detect v6 mapped v4 addresses by @ctrox in #89
- fix: skip in-flight TCP for runc 1.3.0+ by @ctrox in #90
- chore: upgrade dependencies by @ctrox in #91
- ci: run on pull_request instead of push by @ctrox in #98
- some makefile dependency fixes by @oOraph in #94
- fix: workaround CRIU not dumping TCP_LISTEN socket by @ctrox in #99
- claim migration as soon as possible by @ctrox in #100
- Update image versions by @github-actions[bot] in #101
New Contributors
Full Changelog: v0.7.0...v0.8.0
v0.7.0
What's Changed
-
Added handling for HTTP/TCP probes: zeropod is now able to intercept probes while the container process is scaled down to ensure the application is not restored for probes. You can read more about the feature in the docs.
-
Create status events on scale down/restore events: these are referencing the relevant pod so the events will show up in
kubectl describe pod.Type Reason Age From Message ---- ------ ---- ---- ------- Normal Started 3m10s kubelet Started container nginx Normal Scaled down 2m59s zeropod.ctrox.dev/manager Scaled down container nginx in 254.524163ms Normal Running 26s zeropod.ctrox.dev/manager Restored container nginx in 95.508441ms -
Copy "scratch space" container data during migration: when migrating a pod (regardless of live or scaled down), data that has been written to the containers file system will be copied over to the new node in addition to the memory contents.
Full changelog
- feat: emit warnings when eBPF support is missing by @ctrox in #71
- feat: handle kubelet TCP/HTTP probes by @ctrox in #72
- fix: non-live migration status by @ctrox in #73
- feat: upgrade containerd by @ctrox in #74
- feat: create status events by @ctrox in #75
- feat: implement migrating upper container layer by @ctrox in #76
- feat: track multiple sandbox IPs by @ctrox in #77
- Update image versions by @github-actions[bot] in #78
Full Changelog: v0.6.4...v0.7.0
v0.6.4
What's Changed
- fix: tracker cleanup by @ctrox in #58
- fix: check activator exists before calling stop by @ctrox in #60
- feat: chunked image transfer by @ctrox in #61
- feat: upgrade CRIU to v4.1 by @ctrox in #62
- feat: add log message when lazy page check fails by @ctrox in #68
- fix: store container create options for restore by @ctrox in #69
- Update image versions by @github-actions[bot] in #70
Full Changelog: v0.6.3...v0.6.4