Fix least request lb not fair#29873
Conversation
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
|
Hi @barroca, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
0c4f25f to
7711181
Compare
|
/assign |
wbpcode
left a comment
There was a problem hiding this comment.
Thanks for your contribution. And some comments are added.
And at least a release note is necessary to tell what this PR changed or fixed. You can add a new change log entry in this file. https://github.com/envoyproxy/envoy/blob/main/changelogs/current.yaml
| EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3)); | ||
| EXPECT_CALL(random_, random()).WillOnce(Return(9999)); | ||
| EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr)); | ||
| } | ||
|
|
||
| // Host weight is 100. | ||
| { | ||
| EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3)); | ||
| EXPECT_CALL(random_, random()).WillOnce(Return(9999)); | ||
| EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr)); | ||
| } | ||
|
|
||
| HostVector empty; | ||
| { | ||
| hostSet().runCallbacks(empty, empty); | ||
| EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3)); | ||
| EXPECT_CALL(random_, random()).WillOnce(Return(9999)); |
There was a problem hiding this comment.
I didn't get why these change is necessary 🤔
There was a problem hiding this comment.
Doesn't make sense to change to 9999, but the number of calls for random have changed since we will call full scan instead.
api/envoy/extensions/load_balancing_policies/least_request/v3/least_request.proto
Outdated
Show resolved
Hide resolved
| // The number of random healthy hosts from which the host with the fewest active requests will | ||
| // be chosen. Defaults to 2 so that we perform two-choice selection if the field is not set. |
There was a problem hiding this comment.
This PR change the behavior of the LB. It's necessary to add some more comment to tell that if the choice_count is larger than or equal to the hosts size, a full scan will be used.
There was a problem hiding this comment.
This behavior change should be harmless to end users and maybe make this LB's behavior closer to it's name least request.
So, I think a change log should enough rather than a runtime guard.
But still need a check from @envoyproxy/runtime-guard-changes
|
/wait |
…number of choices is larger than the size. Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
|
/lgtm api defer to @wbpcode |
| : absl::nullopt) { | ||
| : absl::nullopt), | ||
| enable_full_scan_( | ||
| PROTOBUF_GET_WRAPPED_OR_DEFAULT(least_request_config.ref(), enable_full_scan, false)) { |
There was a problem hiding this comment.
I'm not 100% sure how to do this on the code, when removing the field from here I was having problems compiling the load_balancer_impl.* files. I would appreciate help on this.
Just keep this enable_full_scan_ always be false if legacy API is used. Then I think this will resolve the compiling problem.
/wait
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
wbpcode
left a comment
There was a problem hiding this comment.
This LGTM overall. Please check the CI and add a release note, thanks.
| double active_request_bias_{}; | ||
|
|
||
| const absl::optional<Runtime::Double> active_request_bias_runtime_; | ||
| const bool enable_full_scan_; |
There was a problem hiding this comment.
nit: const bool enable_full_scan_{};
|
/wait |
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
Signed-off-by: Leonardo da Mata <barroca@gmail.com>
|
Not sure If I need someone else to approve since "envoyproxy/api-shepherds must approve for any API change" is pending. |
Signed-off-by: Leonardo da Mata <barroca@gmail.com>
|
perhaps @lizan needs to take a look again? |
…citly forleast request lb (envoyproxy#30794)" This reverts commit e93e556. Revert "Fix least request lb not fair (envoyproxy#29873)" This reverts commit 3ea2bc4. restore api Signed-off-by: Kuat Yessenov <kuat@google.com> fix merge Signed-off-by: Kuat Yessenov <kuat@google.com>
|
We are reverting this change since we noticed that there is a significant impact on clusters with a small number of hosts caused by the deterministic loop which makes every client pick the same hosts. I think we need a better understanding of the impact with a load simulation for this change to be re-applied. Please consider adding some independent sampling to the algorithm and we can do a more thorough review again. |
|
I understand that changing the behaviour would have impacted people, but It is not clear to me that adding an option to enable full scan would have a bad impact since it is explicitly selecting the host with least requests. Wondering if we can add it back? Thanks @kyessenov and @wbpcode for fixing. I wasn't aware that I had breaking issues . :) |
|
The API will be kept and new implementation is still welcome. And @tonya11en is working a simulated system to ensure we can get a more reasonable implementation in the future. Thanks for your contribution and so sorry for that we need to revert it. We know it taken you lots of time. :( Hope we can bring it back soon. |
|
If the patch is limited to only the “full scan” flag, it should do what you want without affecting the existing selection behavior.
…On Fri, Nov 10, 2023, at 6:17 AM, code wrote:
The API will be kept and new implementation is still welcome. And @tonya11en <https://github.com/tonya11en> is working a simulated system to ensure we can get a more reasonable implementation in the future.
Thanks for your contribution and so sorry for that we need to revert it. We know it taken you lots of time. :(
Hope we can bring it back soon.
—
Reply to this email directly, view it on GitHub <#29873 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIOZ7QPECF7Z7BNKBKEMJTYDYZPRAVCNFSM6AAAAAA5MROLXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBVHAZDEMJYGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
I noticed the "revert PR" hasn't been merged yet. Instead of reverting, is it sufficient to instead add a patch that starts the full scan at a random index (as suggested by @ggreenway and @tomwans here)? Or are there additional concerns that need to be addressed regarding full scan mode? |
|
Hello, I have this PR open that changes the behaviour for a random index: #31146 |
* Add new idea for selecting hosts among those not selected yet. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Change how we choose full table scan Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Remove cout Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix Tests for load_balancer_impl_test Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix format and make sure full scan happens only when selected or the number of choices is larger than the size. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Enable new option on extesions api only Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix Integration tests. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Add release notes for full scan in least request LB. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix ref for release note. Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Fix release notes Signed-off-by: Leonardo da Mata <ldamata@spotify.com> * Update release note Signed-off-by: Leonardo da Mata <ldamata@spotify.com> --------- Signed-off-by: Leonardo da Mata <ldamata@spotify.com> Signed-off-by: Leonardo da Mata <barroca@gmail.com> Co-authored-by: Leonardo da Mata <ldamata@spotify.com>
Commit Message: Fix Least requests LB when doing a random pick so it removes already chosen hosts from the random function to remove the chance of selecting the same host again when dealing with a small amount of hosts. Also, when the number of choices is smaller or equal the number of hosts, use full scan of least used instead.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
This Release changes the default behaviour of Least Request Load Balancer doing a full scan when the number of choices is more than equal the size of hosts and also adds a new option on the envoy::extensions::load_balancing_policies::least_request::v3::LeastRequest configuration to always do a full scan. Allowing a full scan instead of random making choices reduces the chance of selecting a host that doesn't have least requests when the number of hosts is smaller.
Platform Specific Features:
[Optional Runtime guard:]
Fixes #11004
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]