Kubernetes 1.35 Adds In-Place Resizing
Kubernetes version 1.35 introduces a vertical pod autoscaler (VPA) feature for "in-place resize." This allows stateful workloads, such as persistent LLM inference servers, to have their resource allocations adjusted dynamically without requiring a pod restart. The update is expected to reduce operational complexity and costs for managing fluctuating AI workloads.
- Before this feature, changing a pod's CPU or memory requests was an immutable operation, forcing a full pod replacement. This was a significant issue for stateful workloads as it would reset TCP connections, clear in-memory caches, and disrupt long-running training jobs. - The in-place resize feature has been in development for over six years, first introduced as an alpha in Kubernetes v1.27 and graduating to beta in v1.33 before becoming generally available in v1.35. - The underlying mechanism relies on cgroups v2 on the Linux node, which allows the kubelet to directly update the cgroup settings for a running container without a restart. Kubernetes 1.35 also deprecates support for the older cgroups v1. - While CPU resource changes are applied "hot" without a container restart, memory adjustments are configurable and may still require a container restart, though not a full pod recreation, which preserves network connections and mounted volumes. - The Vertical Pod Autoscaler (VPA) utilizes this feature through a new `InPlaceOrRecreate` update mode, which has graduated to beta. This mode allows the VPA's "Updater" component to attempt a non-disruptive resize first. - The status of a resize is now observable via the pod's conditions, allowing operators to see if a requested resource change is pending or has been successfully applied, which improves visibility during incidents. - The feature currently only applies to CPU and memory resources and is not compatible with the static CPU Manager and static Memory Manager policies.