Skip to content

Commit 3ea2bc4

Browse files
authored
Fix least request lb not fair (#29873)
* Add new idea for selecting hosts among those not selected yet. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Change how we choose full table scan Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Remove cout Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix Tests for load_balancer_impl_test Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix format and make sure full scan happens only when selected or the number of choices is larger than the size. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Enable new option on extesions api only Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix Integration tests. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Add release notes for full scan in least request LB. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix ref for release note. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix release notes Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Update release note Signed-off-by: Leonardo da Mata <ldamata@spotify.com> --------- Signed-off-by: Leonardo da Mata <ldamata@spotify.com> Signed-off-by: Leonardo da Mata <barroca@gmail.com> Co-authored-by: Leonardo da Mata <ldamata@spotify.com>
1 parent 41fa676 commit 3ea2bc4

7 files changed

Lines changed: 133 additions & 21 deletions

File tree

api/envoy/extensions/load_balancing_policies/least_request/v3/least_request.proto

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ option (udpa.annotations.file_status).package_version_status = ACTIVE;
2222
// This configuration allows the built-in LEAST_REQUEST LB policy to be configured via the LB policy
2323
// extension point. See the :ref:`load balancing architecture overview
2424
// <arch_overview_load_balancing_types>` for more information.
25+
// [#next-free-field: 6]
2526
message LeastRequest {
2627
// The number of random healthy hosts from which the host with the fewest active requests will
2728
// be chosen. Defaults to 2 so that we perform two-choice selection if the field is not set.
@@ -58,4 +59,9 @@ message LeastRequest {
5859

5960
// Configuration for local zone aware load balancing or locality weighted load balancing.
6061
common.v3.LocalityLbConfig locality_lb_config = 4;
62+
63+
// Configuration for performing full scan on the list of hosts.
64+
// If this configuration is set, when selecting the host a full scan on the list hosts will be
65+
// used to select the one with least requests instead of using random choices.
66+
google.protobuf.BoolValue enable_full_scan = 5;
6167
}

changelogs/current.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ behavior_changes:
1111
issue https://github.com/envoyproxy/envoy/issues/29461 for more specific detail and examples.
1212
1313
minor_behavior_changes:
14+
- area: upstream
15+
change: |
16+
Fixed a reported issue (https://github.com/envoyproxy/envoy/issues/11004) that causes the Least
17+
Request load balancer policy to be unfair when the number of hosts are very small, when the number
18+
of hosts is smaller than the choice_count, instead of randomly selection hosts from the list, we
19+
perform a full scan on it to choose the host with least requests.
1420
# *Changes that may cause incompatibilities for some users, but should not for most*
1521
- area: local_rate_limit
1622
change: |
@@ -56,10 +62,18 @@ removed_config_or_runtime:
5662
runtime flag and legacy code path.
5763
5864
new_features:
65+
- area: upstream
66+
change: |
67+
Added :ref:`enable_full_scan <envoy_v3_api_msg_extensions.load_balancing_policies.least_request.v3.LeastRequest>`
68+
option to the least requested load balancer. If set to true, Envoy will perform a full scan on the list of hosts
69+
instead of using :ref:`choice_count
70+
<envoy_v3_api_msg_extensions.load_balancing_policies.least_request.v3.LeastRequest>`
71+
to select the hosts.
5972
- area: stats
6073
change: |
6174
added :ref:`per_endpoint_stats <envoy_v3_api_field_config.cluster.v3.TrackClusterStats.per_endpoint_stats>` to get some metrics
6275
for each endpoint in a cluster.
76+
6377
- area: jwt
6478
change: |
6579
The jwt filter can now serialize non-primitive custom claims when maping claims to headers.

docs/root/intro/arch_overview/upstream/load_balancing/load_balancers.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@ same or different weights.
3838
approach is nearly as good as an O(N) full scan). This is also known as P2C (power of two
3939
choices). The P2C load balancer has the property that a host with the highest number of active
4040
requests in the cluster will never receive new requests. It will be allowed to drain until it is
41-
less than or equal to all of the other hosts.
41+
less than or equal to all of the other hosts. The number of hosts chosen can be changed by setting
42+
``choice_count``.
43+
4244
* *all weights not equal*: If two or more hosts in the cluster have different load balancing
4345
weights, the load balancer shifts into a mode where it uses a weighted round robin schedule in
4446
which weights are dynamically adjusted based on the host's request load at the time of selection.

source/common/upstream/load_balancer_impl.cc

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1299,6 +1299,25 @@ HostConstSharedPtr LeastRequestLoadBalancer::unweightedHostPick(const HostVector
12991299
const HostsSource&) {
13001300
HostSharedPtr candidate_host = nullptr;
13011301

1302+
// Do full scan if it's required explicitly or the number of choices is equal to or larger than
1303+
// the hosts size.
1304+
if ((hosts_to_use.size() <= choice_count_) || enable_full_scan_) {
1305+
for (const auto& sampled_host : hosts_to_use) {
1306+
if (candidate_host == nullptr) {
1307+
// Make a first choice to start the comparisons.
1308+
candidate_host = sampled_host;
1309+
continue;
1310+
}
1311+
1312+
const auto candidate_active_rq = candidate_host->stats().rq_active_.value();
1313+
const auto sampled_active_rq = sampled_host->stats().rq_active_.value();
1314+
if (sampled_active_rq < candidate_active_rq) {
1315+
candidate_host = sampled_host;
1316+
}
1317+
}
1318+
return candidate_host;
1319+
}
1320+
13021321
for (uint32_t choice_idx = 0; choice_idx < choice_count_; ++choice_idx) {
13031322
const int rand_idx = random_.random() % hosts_to_use.size();
13041323
const HostSharedPtr& sampled_host = hosts_to_use[rand_idx];

source/common/upstream/load_balancer_impl.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -710,7 +710,9 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
710710
least_request_config.has_active_request_bias()
711711
? absl::optional<Runtime::Double>(
712712
{least_request_config.active_request_bias(), runtime})
713-
: absl::nullopt) {
713+
: absl::nullopt),
714+
enable_full_scan_(
715+
PROTOBUF_GET_WRAPPED_OR_DEFAULT(least_request_config, enable_full_scan, false)) {
714716
initialize();
715717
}
716718

@@ -746,6 +748,7 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
746748
double active_request_bias_{};
747749

748750
const absl::optional<Runtime::Double> active_request_bias_runtime_;
751+
const bool enable_full_scan_{};
749752
};
750753

751754
/**

test/common/upstream/load_balancer_impl_test.cc

Lines changed: 83 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2787,20 +2787,20 @@ TEST_P(LeastRequestLoadBalancerTest, SingleHost) {
27872787

27882788
// Host weight is 1.
27892789
{
2790-
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
2790+
EXPECT_CALL(random_, random()).WillOnce(Return(0));
27912791
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
27922792
}
27932793

27942794
// Host weight is 100.
27952795
{
2796-
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
2796+
EXPECT_CALL(random_, random()).WillOnce(Return(0));
27972797
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
27982798
}
27992799

28002800
HostVector empty;
28012801
{
28022802
hostSet().runCallbacks(empty, empty);
2803-
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
2803+
EXPECT_CALL(random_, random()).WillOnce(Return(0));
28042804
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
28052805
}
28062806

@@ -2823,37 +2823,44 @@ TEST_P(LeastRequestLoadBalancerTest, Normal) {
28232823

28242824
hostSet().healthy_hosts_[0]->stats().rq_active_.set(1);
28252825
hostSet().healthy_hosts_[1]->stats().rq_active_.set(2);
2826-
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
2826+
EXPECT_CALL(random_, random()).WillOnce(Return(0));
28272827
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
28282828

28292829
hostSet().healthy_hosts_[0]->stats().rq_active_.set(2);
28302830
hostSet().healthy_hosts_[1]->stats().rq_active_.set(1);
2831-
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
2831+
EXPECT_CALL(random_, random()).WillOnce(Return(0));
28322832
EXPECT_EQ(hostSet().healthy_hosts_[1], lb_.chooseHost(nullptr));
28332833
}
28342834

28352835
TEST_P(LeastRequestLoadBalancerTest, PNC) {
28362836
hostSet().healthy_hosts_ = {makeTestHost(info_, "tcp://127.0.0.1:80", simTime()),
28372837
makeTestHost(info_, "tcp://127.0.0.1:81", simTime()),
28382838
makeTestHost(info_, "tcp://127.0.0.1:82", simTime()),
2839-
makeTestHost(info_, "tcp://127.0.0.1:83", simTime())};
2839+
makeTestHost(info_, "tcp://127.0.0.1:83", simTime()),
2840+
makeTestHost(info_, "tcp://127.0.0.1:84", simTime())};
28402841
hostSet().hosts_ = hostSet().healthy_hosts_;
28412842
hostSet().runCallbacks({}, {}); // Trigger callbacks. The added/removed lists are not relevant.
28422843

28432844
hostSet().healthy_hosts_[0]->stats().rq_active_.set(4);
28442845
hostSet().healthy_hosts_[1]->stats().rq_active_.set(3);
28452846
hostSet().healthy_hosts_[2]->stats().rq_active_.set(2);
28462847
hostSet().healthy_hosts_[3]->stats().rq_active_.set(1);
2848+
hostSet().healthy_hosts_[4]->stats().rq_active_.set(5);
28472849

28482850
// Creating various load balancer objects with different choice configs.
28492851
envoy::config::cluster::v3::Cluster::LeastRequestLbConfig lr_lb_config;
28502852
lr_lb_config.mutable_choice_count()->set_value(2);
28512853
LeastRequestLoadBalancer lb_2{priority_set_, nullptr, stats_, runtime_,
28522854
random_, common_config_, lr_lb_config, simTime()};
2853-
lr_lb_config.mutable_choice_count()->set_value(5);
2854-
LeastRequestLoadBalancer lb_5{priority_set_, nullptr, stats_, runtime_,
2855+
lr_lb_config.mutable_choice_count()->set_value(3);
2856+
LeastRequestLoadBalancer lb_3{priority_set_, nullptr, stats_, runtime_,
2857+
random_, common_config_, lr_lb_config, simTime()};
2858+
lr_lb_config.mutable_choice_count()->set_value(4);
2859+
LeastRequestLoadBalancer lb_4{priority_set_, nullptr, stats_, runtime_,
2860+
random_, common_config_, lr_lb_config, simTime()};
2861+
lr_lb_config.mutable_choice_count()->set_value(6);
2862+
LeastRequestLoadBalancer lb_6{priority_set_, nullptr, stats_, runtime_,
28552863
random_, common_config_, lr_lb_config, simTime()};
2856-
28572864
// Verify correct number of choices.
28582865

28592866
// 0 choices configured should default to P2C.
@@ -2864,20 +2871,78 @@ TEST_P(LeastRequestLoadBalancerTest, PNC) {
28642871
EXPECT_CALL(random_, random()).Times(3).WillRepeatedly(Return(0));
28652872
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_2.chooseHost(nullptr));
28662873

2867-
// 5 choices configured results in P5C.
2868-
EXPECT_CALL(random_, random()).Times(6).WillRepeatedly(Return(0));
2869-
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_5.chooseHost(nullptr));
2870-
2871-
// Verify correct host chosen in P5C scenario.
2874+
// Verify correct host chosen in P3C scenario.
28722875
EXPECT_CALL(random_, random())
2873-
.Times(6)
2876+
.Times(4)
28742877
.WillOnce(Return(0))
28752878
.WillOnce(Return(3))
2879+
.WillOnce(Return(1))
2880+
.WillOnce(Return(2));
2881+
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_3.chooseHost(nullptr));
2882+
2883+
// Verify correct host chosen in P4C scenario.
2884+
EXPECT_CALL(random_, random())
2885+
.Times(5)
28762886
.WillOnce(Return(0))
28772887
.WillOnce(Return(3))
2878-
.WillOnce(Return(2))
2879-
.WillOnce(Return(1));
2880-
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_5.chooseHost(nullptr));
2888+
.WillOnce(Return(1))
2889+
.WillOnce(Return(1))
2890+
.WillOnce(Return(2));
2891+
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_4.chooseHost(nullptr));
2892+
2893+
// When the number of hosts is smaller or equal to the number of choices we don't call
2894+
// random() since we do a full table scan.
2895+
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
2896+
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_6.chooseHost(nullptr));
2897+
}
2898+
2899+
TEST_P(LeastRequestLoadBalancerTest, FullScan) {
2900+
hostSet().healthy_hosts_ = {makeTestHost(info_, "tcp://127.0.0.1:80", simTime()),
2901+
makeTestHost(info_, "tcp://127.0.0.1:81", simTime()),
2902+
makeTestHost(info_, "tcp://127.0.0.1:82", simTime()),
2903+
makeTestHost(info_, "tcp://127.0.0.1:83", simTime()),
2904+
makeTestHost(info_, "tcp://127.0.0.1:84", simTime())};
2905+
hostSet().hosts_ = hostSet().healthy_hosts_;
2906+
hostSet().runCallbacks({}, {}); // Trigger callbacks. The added/removed lists are not relevant.
2907+
2908+
hostSet().healthy_hosts_[0]->stats().rq_active_.set(4);
2909+
hostSet().healthy_hosts_[1]->stats().rq_active_.set(3);
2910+
hostSet().healthy_hosts_[2]->stats().rq_active_.set(2);
2911+
hostSet().healthy_hosts_[3]->stats().rq_active_.set(1);
2912+
hostSet().healthy_hosts_[4]->stats().rq_active_.set(5);
2913+
2914+
// Creating various load balancer objects with different choice configs.
2915+
envoy::extensions::load_balancing_policies::least_request::v3::LeastRequest lr_lb_config;
2916+
lr_lb_config.mutable_choice_count()->set_value(2);
2917+
// Enable full table scan on hosts
2918+
lr_lb_config.mutable_enable_full_scan()->set_value(true);
2919+
common_config_.mutable_healthy_panic_threshold()->set_value(0);
2920+
2921+
LeastRequestLoadBalancer lb_2{priority_set_, nullptr, stats_, runtime_,
2922+
random_, 1, lr_lb_config, simTime()};
2923+
lr_lb_config.mutable_choice_count()->set_value(3);
2924+
LeastRequestLoadBalancer lb_3{priority_set_, nullptr, stats_, runtime_,
2925+
random_, 1, lr_lb_config, simTime()};
2926+
lr_lb_config.mutable_choice_count()->set_value(4);
2927+
LeastRequestLoadBalancer lb_4{priority_set_, nullptr, stats_, runtime_,
2928+
random_, 1, lr_lb_config, simTime()};
2929+
lr_lb_config.mutable_choice_count()->set_value(6);
2930+
LeastRequestLoadBalancer lb_6{priority_set_, nullptr, stats_, runtime_,
2931+
random_, 1, lr_lb_config, simTime()};
2932+
2933+
// random is called only once every time and is not to select the host.
2934+
2935+
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
2936+
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_2.chooseHost(nullptr));
2937+
2938+
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
2939+
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_3.chooseHost(nullptr));
2940+
2941+
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
2942+
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_4.chooseHost(nullptr));
2943+
2944+
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
2945+
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_6.chooseHost(nullptr));
28812946
}
28822947

28832948
TEST_P(LeastRequestLoadBalancerTest, WeightImbalance) {

test/integration/http_subset_lb_integration_test.cc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,10 @@ class HttpSubsetLbIntegrationTest
176176
}
177177
}
178178

179-
if (is_hash_lb_) {
179+
// The default number of choices for the LEAST_REQUEST policy is 2 hosts, if the number of hosts
180+
// is equal to the number of choices, a full scan happens instead, this means that the same host
181+
// will be chosen.
182+
if (is_hash_lb_ || (GetParam() == envoy::config::cluster::v3::Cluster::LEAST_REQUEST)) {
180183
EXPECT_EQ(hosts.size(), 1) << "Expected a single unique host to be selected for "
181184
<< envoy::config::cluster::v3::Cluster::LbPolicy_Name(GetParam());
182185
} else {

0 commit comments

Comments
 (0)