Skip to content

Fix least request lb not fair#29873

Merged
wbpcode merged 14 commits intoenvoyproxy:mainfrom
barroca:ldamata/fix-least-request-LB-not-fair
Oct 26, 2023
Merged

Fix least request lb not fair#29873
wbpcode merged 14 commits intoenvoyproxy:mainfrom
barroca:ldamata/fix-least-request-LB-not-fair

Conversation

@barroca
Copy link
Copy Markdown
Contributor

@barroca barroca commented Sep 29, 2023

Commit Message: Fix Least requests LB when doing a random pick so it removes already chosen hosts from the random function to remove the chance of selecting the same host again when dealing with a small amount of hosts. Also, when the number of choices is smaller or equal the number of hosts, use full scan of least used instead.

Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
This Release changes the default behaviour of Least Request Load Balancer doing a full scan when the number of choices is more than equal the size of hosts and also adds a new option on the envoy::extensions::load_balancing_policies::least_request::v3::LeastRequest configuration to always do a full scan. Allowing a full scan instead of random making choices reduces the chance of selecting a host that doesn't have least requests when the number of hosts is smaller.

Platform Specific Features:
[Optional Runtime guard:]
Fixes #11004
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

barroca and others added 3 commits September 29, 2023 14:11
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
@repokitteh-read-only
Copy link
Copy Markdown

Hi @barroca, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #29873 was opened by barroca.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #29873 was opened by barroca.

see: more, trace.

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
@barroca barroca force-pushed the ldamata/fix-least-request-LB-not-fair branch from 0c4f25f to 7711181 Compare September 29, 2023 20:44
@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Sep 30, 2023

/assign

Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. And some comments are added.

And at least a release note is necessary to tell what this PR changed or fixed. You can add a new change log entry in this file. https://github.com/envoyproxy/envoy/blob/main/changelogs/current.yaml

Comment on lines +2790 to +2803
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

// Host weight is 100.
{
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

HostVector empty;
{
hostSet().runCallbacks(empty, empty);
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why these change is necessary 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make sense to change to 9999, but the number of calls for random have changed since we will call full scan instead.

Comment on lines 27 to 28
// The number of random healthy hosts from which the host with the fewest active requests will
// be chosen. Defaults to 2 so that we perform two-choice selection if the field is not set.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR change the behavior of the LB. It's necessary to add some more comment to tell that if the choice_count is larger than or equal to the hosts size, a full scan will be used.

Copy link
Copy Markdown
Member

@wbpcode wbpcode Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior change should be harmless to end users and maybe make this LB's behavior closer to it's name least request.
So, I think a change log should enough rather than a runtime guard.

But still need a check from @envoyproxy/runtime-guard-changes

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Oct 12, 2023

/wait

…number of choices is larger than the size.

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
@lizan
Copy link
Copy Markdown
Member

lizan commented Oct 17, 2023

/lgtm api

defer to @wbpcode

@lizan lizan removed their assignment Oct 18, 2023
Comment on lines +697 to +699
: absl::nullopt) {
: absl::nullopt),
enable_full_scan_(
PROTOBUF_GET_WRAPPED_OR_DEFAULT(least_request_config.ref(), enable_full_scan, false)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure how to do this on the code, when removing the field from here I was having problems compiling the load_balancer_impl.* files. I would appreciate help on this.

Just keep this enable_full_scan_ always be false if legacy API is used. Then I think this will resolve the compiling problem.

/wait

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM overall. Please check the CI and add a release note, thanks.

double active_request_bias_{};

const absl::optional<Runtime::Double> active_request_bias_runtime_;
const bool enable_full_scan_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool enable_full_scan_{};

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Oct 20, 2023

/wait

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
wbpcode
wbpcode previously approved these changes Oct 24, 2023
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Signed-off-by: Leonardo da Mata <barroca@gmail.com>
@barroca
Copy link
Copy Markdown
Contributor Author

barroca commented Oct 25, 2023

Not sure If I need someone else to approve since "envoyproxy/api-shepherds must approve for any API change" is pending.

Signed-off-by: Leonardo da Mata <barroca@gmail.com>
@barroca
Copy link
Copy Markdown
Contributor Author

barroca commented Oct 26, 2023

perhaps @lizan needs to take a look again?

@wbpcode wbpcode enabled auto-merge (squash) October 26, 2023 13:05
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@wbpcode wbpcode merged commit 3ea2bc4 into envoyproxy:main Oct 26, 2023
@barroca barroca deleted the ldamata/fix-least-request-LB-not-fair branch October 26, 2023 14:01
kyessenov added a commit to kyessenov/envoy that referenced this pull request Nov 9, 2023
…citly forleast request lb (envoyproxy#30794)"

This reverts commit e93e556.

Revert "Fix least request lb not fair (envoyproxy#29873)"

This reverts commit 3ea2bc4.

restore api

Signed-off-by: Kuat Yessenov <kuat@google.com>

fix merge

Signed-off-by: Kuat Yessenov <kuat@google.com>
@kyessenov
Copy link
Copy Markdown
Contributor

kyessenov commented Nov 9, 2023

We are reverting this change since we noticed that there is a significant impact on clusters with a small number of hosts caused by the deterministic loop which makes every client pick the same hosts. I think we need a better understanding of the impact with a load simulation for this change to be re-applied. Please consider adding some independent sampling to the algorithm and we can do a more thorough review again.

@barroca
Copy link
Copy Markdown
Contributor Author

barroca commented Nov 10, 2023

I understand that changing the behaviour would have impacted people, but It is not clear to me that adding an option to enable full scan would have a bad impact since it is explicitly selecting the host with least requests. Wondering if we can add it back?

Thanks @kyessenov and @wbpcode for fixing. I wasn't aware that I had breaking issues . :)

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Nov 10, 2023

The API will be kept and new implementation is still welcome. And @tonya11en is working a simulated system to ensure we can get a more reasonable implementation in the future.

Thanks for your contribution and so sorry for that we need to revert it. We know it taken you lots of time. :(

Hope we can bring it back soon.

@tonya11en
Copy link
Copy Markdown
Member

tonya11en commented Nov 10, 2023 via email

@jkirschner-hashicorp
Copy link
Copy Markdown
Contributor

I noticed the "revert PR" hasn't been merged yet.

Instead of reverting, is it sufficient to instead add a patch that starts the full scan at a random index (as suggested by @ggreenway and @tomwans here)? Or are there additional concerns that need to be addressed regarding full scan mode?

@barroca
Copy link
Copy Markdown
Contributor Author

barroca commented Dec 2, 2023

Hello, I have this PR open that changes the behaviour for a random index: #31146

kyessenov added a commit that referenced this pull request Dec 9, 2023
Commit Message: Reverts #29873 and #30794

Multiple concerns about the effect of a full scan on LEAST_REQUEST have been raised.
See previous discussions in #11004 and #11006.

Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
jkirschner-hashicorp pushed a commit to jkirschner-hashicorp/envoy that referenced this pull request Dec 21, 2023
* Add new idea for selecting hosts among those not selected yet.

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Change how we choose full table scan

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Remove cout

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Fix Tests for load_balancer_impl_test

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Fix format and make sure full scan happens only when selected or the number of choices is larger than the size.

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Enable new option on extesions api only

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Fix Integration tests.

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Add release notes for full scan in least request LB.

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Fix ref for release note.

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Fix release notes

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

* Update release note

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>

---------

Signed-off-by: Leonardo da Mata <ldamata@spotify.com>
Signed-off-by: Leonardo da Mata <barroca@gmail.com>
Co-authored-by: Leonardo da Mata <ldamata@spotify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Least requests LB is not fair at edge cases

6 participants