Add option to random load balance to >1 server nodes#631
Add option to random load balance to >1 server nodes#631
Conversation
|
|
prashantv
left a comment
There was a problem hiding this comment.
Thanks for the contribution @kamran-m
Some changes I'd like to see:
- Benchmarking to show the performance before and after this change, since currently there's some perf regressions.
- Tests that cover all the different cases (peerConnectionCount = 0, peerConnectionCount = 1, peerConnectionCount < available peers, peerConnectionCount > available peers, etc)
peer.go
Outdated
| func (l *PeerList) choosePeer(prevSelected map[string]struct{}, avoidHost bool) *Peer { | ||
| var psPopList []*peerScore | ||
| var ps *peerScore | ||
| var chosenPSList []*peerScore |
There was a problem hiding this comment.
presize chosenPSList to avoid a bunch of allocations on each selection, especially when it's > 1
peer.go
Outdated
| } | ||
|
|
||
| func randomSampling(psList []*peerScore) *peerScore { | ||
| peerRand := trand.NewSeeded() |
There was a problem hiding this comment.
please do not create a new rand for every single peer selection request.
peer.go
Outdated
| } | ||
|
|
||
| ps := randomSampling(chosenPSList) | ||
| if ps == nil { |
There was a problem hiding this comment.
why would we need this check?
|
Thanks for the review @prashantv , I addressed your comments. I also added some benchmark tests with these results: and also used the same test in the old code to compare with Please let me know if the |
prashantv
left a comment
There was a problem hiding this comment.
Sorry for the delay in reviewing @kamran-m, I was OOO and was catching up after I got back.
I've been thinking a little bit about whether this is the right way to do it -- I don't know we should put every strategy into the TChannel core library. For example, see another strategy that we'd like: #640
If instead of this change, we just ensured that TChannel had 2 connected peers, then:
- No changes are needed to TChannel
- You get the standard peer selection (least pending, with round-robin fallback if the scores are the same).
That would likely have even better load distribution.
peer.go
Outdated
| ErrNoNewPeers = errors.New("no new peer available") | ||
|
|
||
| // ErrZeroPeerConnectionCount indicates that the peer connection count is set to zero. | ||
| ErrZeroPeerConnectionCount = errors.New("peer connection count must be greater than 0") |
There was a problem hiding this comment.
No reason to export this error -- once you export an error, it's part of the public API and we can never change it. There shouldn't be a good reason for a user to compare a returned error against this, so it shouldn't be exported.
|
|
||
| for i := 0; i < b.N; i++ { | ||
| peer, _ := ch.Peers().Get(nil) | ||
| if peer == nil { |
There was a problem hiding this comment.
should this ever happen? If not, maybe we should do a t.Fatal instead of a println
There was a problem hiding this comment.
It shouldn't, I just added it to guard against compiler optimization not to artificially lower the runtime of the benchmark. Changed it to Fatal!
There was a problem hiding this comment.
It shouldn't, I just added it to guard against compiler optimization not to artificially lower the runtime of the benchmark. Changed it to Fatal!
peer_test.go
Outdated
|
|
||
| func BenchmarkGetPeerWithPeerConnectionCount10(b *testing.B) { | ||
| numPeers := 10 | ||
| peerConnectionCount := uint32(10) |
There was a problem hiding this comment.
Instead of copying + pasting the whole benchmark, abstract common logic out into a function, and then call it with different args for the peerConnectionCount and maybe even numPeers
| }{ | ||
| // the higher `peerConnectionCount` is, the smoother the impact of uneven scores | ||
| // become as we are random sampling among `peerConnectionCount` peers | ||
| {numPeers: 10, peerConnectionCount: 1, distMin: 1000, distMax: 1000}, |
There was a problem hiding this comment.
can you add peerConnectionCount: 2, i imagine this will be a pretty small value normally?
There was a problem hiding this comment.
Indeed as the impact of it get's smaller with every extra connection. Added the case for 2!
There was a problem hiding this comment.
Indeed as the impact of it get's smaller with every extra connection. Added the case for 2!
| for _, tc := range testCases { | ||
| // Selected is a map from rank -> [peer, count] | ||
| // It tracks how often a peer gets selected at a specific rank. | ||
| selected := make([]map[string]int, tc.numPeers) |
There was a problem hiding this comment.
why do you make a slice here, why can't we just create a singel map in the loop for numIterations, and do the testDistribution call right there too? Makes 3 loops 1 loop, removes a slice, and simplifies the test a little.
There was a problem hiding this comment.
I am not sure if I follow. This is a similar test as TestPeerSelectionRanking where we are checking the distribution of the ranking after numIterations which means we have to do the checking outside of the loop. Or am I missing something here?
There was a problem hiding this comment.
I am not sure if I follow. This is a similar test as TestPeerSelectionRanking where we are checking the distribution of the ranking after numIterations which means we have to do the checking outside of the loop. Or am I missing something here?
peer_test.go
Outdated
| b.ResetTimer() | ||
|
|
||
| for i := 0; i < b.N; i++ { | ||
| peer, _ := ch.Peers().Get(nil) |
There was a problem hiding this comment.
we should think about testing the impact of passing a non-nil blacklist with peerConnectionCount > 1
There was a problem hiding this comment.
You mean if prevSelected actually has some selected peers? I am assuming if a client uses peerConnectionCount > 1, it should keep it into account that the prevSelected peers is going to be a bigger list. Or do you have another specific issue in mind?
There was a problem hiding this comment.
You mean if prevSelected actually has some selected peers? I am assuming if a client uses peerConnectionCount > 1, it should keep it into account that the prevSelected peers is going to be a bigger list. Or do you have another specific issue in mind?
Adding possibility to set `peerConnectionCount` so that a single client node can do a random load balancing among `peerConnectionCount` top nodes using the score calculator and peer heap.
f1d327f to
f8f2139
Compare
|
I agree with the idea that the core should not be strategy aware. As I commented on #630 I think all the logic regarding peer connection count, score strategy, peer heap, etc should move to something like PeerSelectionStrategy. I am not sure why you think with 2 connected peers we won't need this core change and also get a better load distribution. This change is basically letting the client decide how many connected peers they want. So in cases, like ours, were we have a very low number of server/client nodes, it might actually be helpful to use 5 connected peers as you can verify with the stats I posted above. Let me know your thoughts. |
This closes #630
I have also verified the change for a
client => serverwith low number of client/server nodes, where the imbalance was more severe (note that here I usedpeerConnectionCount>=numPeers=5, which is the extreme case but it's enough to verify the solution):