-
Notifications
You must be signed in to change notification settings - Fork 673
Description
Description:
Currently, Envoy Gateway's Outlier Detection configuration doesn't expose the AlwaysEjectOneHost field that exists in Envoy's native outlier detection configuration. This feature is useful for certain load balancing scenarios where it's beneficial to always have at least one host ejected regardless of the value of max_ejection_percent.
Motivation for alwaysEjectOneHost
-
Percentage-based ejection limits (
max_ejection_percent) can round down to zero for small clusters (e.g. 2–9 replicas), making outlier detection ineffective even when a host is clearly unhealthy. Many real-world deployments use dynamic autoscaling, where the number of replicas changes frequently and cannot be reliably tuned with static percentage thresholds. In practice, failures often affect a single instance at a time (e.g. node failure, bad pod, transient network issue). Ensuring at least one host can be ejected significantly reduces user-facing impact in these common scenarios. Increasingmax_ejection_percentglobally (e.g. to 50%) to compensate is undesirable, as it increases the risk of excessive ejections in larger clusters and widens the blast radius during real incidents.Upstream issue Outlier_detection: enforce max_ejection_percentage envoy#27909 (comment) -
Historically, Envoy’s outlier detection behavior guaranteed that at least one host would be ejected outlier_detection: add always_eject_one_host envoy#34796, regardless of
max_ejection_percent. Removing this behavior can cause silent regressions -
Introducing opt-in field preserves backward compatibility
[optional Relevant Links:]
https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/outlier_detection.proto#config-cluster-v3-outlierdetection