Add option to random load balance to >1 server nodes by kamran-m · Pull Request #631 · uber/tchannel-go

kamran-m · 2017-06-17T06:39:58Z

This closes #630

I have also verified the change for a client => server with low number of client/server nodes, where the imbalance was more severe (note that here I used peerConnectionCount>= numPeers=5, which is the extreme case but it's enough to verify the solution):

CLAassistant · 2017-06-17T06:40:04Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

prashantv

Thanks for the contribution @kamran-m

Some changes I'd like to see:

Benchmarking to show the performance before and after this change, since currently there's some perf regressions.
Tests that cover all the different cases (peerConnectionCount = 0, peerConnectionCount = 1, peerConnectionCount < available peers, peerConnectionCount > available peers, etc)

prashantv · 2017-06-22T05:15:45Z

peer.go

 func (l *PeerList) choosePeer(prevSelected map[string]struct{}, avoidHost bool) *Peer {
-	var psPopList []*peerScore
-	var ps *peerScore
+	var chosenPSList []*peerScore


presize chosenPSList to avoid a bunch of allocations on each selection, especially when it's > 1

prashantv · 2017-06-22T05:16:23Z

peer.go

 }

+func randomSampling(psList []*peerScore) *peerScore {
+	peerRand := trand.NewSeeded()


please do not create a new rand for every single peer selection request.

prashantv · 2017-06-22T05:16:37Z

peer.go

 	}

+	ps := randomSampling(chosenPSList)
 	if ps == nil {


why would we need this check?

kamran-m · 2017-06-30T07:25:29Z

Thanks for the review @prashantv , I addressed your comments. I also added some benchmark tests with these results:

BenchmarkGetPeerWithPeerConnectionCount1  	 5000000	       348 ns/op
BenchmarkGetPeerWithPeerConnectionCount10 	 1000000	      1490 ns/op

and also used the same test in the old code to compare with peerConnectionCount = 1 case above:

BenchmarkGetPeerOld-8   	 5000000	       267 ns/op

Please let me know if the 267 => 348 is of any concern.

prashantv

Sorry for the delay in reviewing @kamran-m, I was OOO and was catching up after I got back.

I've been thinking a little bit about whether this is the right way to do it -- I don't know we should put every strategy into the TChannel core library. For example, see another strategy that we'd like: #640

If instead of this change, we just ensured that TChannel had 2 connected peers, then:

No changes are needed to TChannel
You get the standard peer selection (least pending, with round-robin fallback if the scores are the same).

That would likely have even better load distribution.

prashantv · 2017-07-20T18:04:08Z

peer.go

 	ErrNoNewPeers = errors.New("no new peer available")

+	// ErrZeroPeerConnectionCount indicates that the peer connection count is set to zero.
+	ErrZeroPeerConnectionCount = errors.New("peer connection count must be greater than 0")


No reason to export this error -- once you export an error, it's part of the public API and we can never change it. There shouldn't be a good reason for a user to compare a returned error against this, so it shouldn't be exported.

Makes sense!

prashantv · 2017-08-04T19:09:39Z

peer_test.go

+
+	for i := 0; i < b.N; i++ {
+		peer, _ := ch.Peers().Get(nil)
+		if peer == nil {


should this ever happen? If not, maybe we should do a t.Fatal instead of a println

It shouldn't, I just added it to guard against compiler optimization not to artificially lower the runtime of the benchmark. Changed it to Fatal!

prashantv · 2017-08-04T19:10:23Z

peer_test.go

+
+func BenchmarkGetPeerWithPeerConnectionCount10(b *testing.B) {
+	numPeers := 10
+	peerConnectionCount := uint32(10)


Instead of copying + pasting the whole benchmark, abstract common logic out into a function, and then call it with different args for the peerConnectionCount and maybe even numPeers

prashantv · 2017-08-04T19:11:20Z

peer_test.go

+	}{
+		// the higher `peerConnectionCount` is, the smoother the impact of uneven scores
+		// become as we are random sampling among `peerConnectionCount` peers
+		{numPeers: 10, peerConnectionCount: 1, distMin: 1000, distMax: 1000},


can you add peerConnectionCount: 2, i imagine this will be a pretty small value normally?

Indeed as the impact of it get's smaller with every extra connection. Added the case for 2!

prashantv · 2017-08-04T19:12:35Z

peer_test.go

+	for _, tc := range testCases {
+		// Selected is a map from rank -> [peer, count]
+		// It tracks how often a peer gets selected at a specific rank.
+		selected := make([]map[string]int, tc.numPeers)


why do you make a slice here, why can't we just create a singel map in the loop for numIterations, and do the testDistribution call right there too? Makes 3 loops 1 loop, removes a slice, and simplifies the test a little.

I am not sure if I follow. This is a similar test as TestPeerSelectionRanking where we are checking the distribution of the ranking after numIterations which means we have to do the checking outside of the loop. Or am I missing something here?

prashantv · 2017-08-04T19:13:38Z

peer_test.go

+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		peer, _ := ch.Peers().Get(nil)


we should think about testing the impact of passing a non-nil blacklist with peerConnectionCount > 1

You mean if prevSelected actually has some selected peers? I am assuming if a client uses peerConnectionCount > 1, it should keep it into account that the prevSelected peers is going to be a bigger list. Or do you have another specific issue in mind?

Adding possibility to set `peerConnectionCount` so that a single client node can do a random load balancing among `peerConnectionCount` top nodes using the score calculator and peer heap.

kamran-m · 2017-08-08T06:59:28Z

I agree with the idea that the core should not be strategy aware. As I commented on #630 I think all the logic regarding peer connection count, score strategy, peer heap, etc should move to something like PeerSelectionStrategy.

I am not sure why you think with 2 connected peers we won't need this core change and also get a better load distribution. This change is basically letting the client decide how many connected peers they want. So in cases, like ours, were we have a very low number of server/client nodes, it might actually be helpful to use 5 connected peers as you can verify with the stats I posted above.

Let me know your thoughts.

kamran-m requested a review from prashantv June 21, 2017 21:37

blampe requested review from akshayjshah and kriskowal June 21, 2017 23:03

prashantv suggested changes Jun 22, 2017

View reviewed changes

akshayjshah removed their request for review July 5, 2017 16:40

kriskowal removed their request for review July 5, 2017 23:53

prashantv suggested changes Aug 4, 2017

View reviewed changes

kamran-m added 3 commits August 7, 2017 23:42

Add option to random load balance to >1 server nodes

02fd160

Adding possibility to set `peerConnectionCount` so that a single client node can do a random load balancing among `peerConnectionCount` top nodes using the score calculator and peer heap.

Address comments

072d525

Address comments

f8f2139

kamran-m force-pushed the rand-load-balance branch from f1d327f to f8f2139 Compare August 8, 2017 06:47

prashantv force-pushed the dev branch from 7334266 to e5e1adf Compare November 7, 2017 18:12

prashantv mentioned this pull request Mar 24, 2020

Add option to random load balance to >1 server nodes #766

Open

Conversation

kamran-m commented Jun 17, 2017

Uh oh!

CLAassistant commented Jun 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prashantv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamran-m commented Jun 30, 2017

Uh oh!

prashantv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamran-m commented Aug 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 17, 2017 •

edited

Loading