How to reach consensus fast and in a lazy way #370
cason
started this conversation in
Specifications
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This discussion should serve as basis for the node synchronization protocol designed, tracked in #260.
Assume that we have a node that is not a validator (at the considered heights or at all), or it is slow or at some extent lagging behind in the protocol. What is the best approach to progress as fast as possible, by committing valid blocks, without misbehaving but being selfish and careless, namely, not performing actions that it is not forced to while still guaranteeing progress.
Commits or Precommit certificates
The first priority of this node is to receive or look after in its message log for a
Commit, namely, a set with2f + 1voting-power equivalentPrecommitmessages for the same height, round, and value. Once aCommitfor a height is found and validated, the node does not need to consider any other vote message for that same height it receives or has in its message log.The only information that the node needs for a height for which it has a
Commitis the committed value. I assume here that the propagation of votes and of values is performed in an independent way. Moreover, I assume that theCommit, or more precisely, one of thePrecommits it contains, has enough information to enable the node to find out the corresponding full value.Notice that if the node is a validator and finds a
Commitfor heightH, its participation on heightHwas not needed. So, in the rationale of minimum effort, a node could just wait for enoughPrecommitmessages to cheaply decide a height.Polkas or Prevote certificates
If our lazy node does not see, after a while, a
Commitfor a height where it is one of the validators, this is an important indication that no enoughPrecommitmessages were issued, and potentially our lazy node has to issue aPrecommitto allow the system to progress. This is particularly true whenPrecommitmessages for that height and a round are available, produced by at least one correct node, i.e., when at leastf + 1voting-power equivalent matchingPrecommitmessages are available.In order to issue a
Precommitfor a value in that height and round, the node has to see what is called aPolka, i.e., a set with2f + 1voting-power equivalentPrevotemessages for the same height, round, and value. This is therefore the second priority of a lazy node: when there is noCommitin sight, it must look for aPolka, possibly the most recent (highest round) one.Once it has a
Polkafor a height, round, and value, the node needs to retrieve the associated full value. The same considerations as for aCommitare valid here. Once aPolkaand the corresponding full value are retrieved, the validator issues itsPrecommit. From this point, and assuming other correct nodes, the node should eventually see aCommitand decide the height.Proposal
If our lazy node does not see, after a while, a
Polkafor a round and height where it is one of the validators, this is an important indication that no enoughPrevotemessages were issued, and potentially our lazy node has to issue aPrevoteto allow the system to progress. This is particularly true whenPrevotemessages for that height and a round are available, produced by at least one correct node, i.e., when at leastf + 1voting-power equivalent matchingPrevotemessages are available.In order to issue a
Prevotein a round of consensus, the node needs to retrieve and validate the value proposed in that round. In the pseudo-code, this message is aProposal, while implementations probably make a distinction between aProposalas a consensus message and the actual proposed valuev, propagated by the proposer. This is therefore the third priority of a lazy node: when there is noCommitorPolkain sight, it must look for aProposal, possibly the most recent (highest round) one.When a node is in this situation, it has to perform all the work expected from a validator in a round of consensus. Which means that it is probably not slow or lagging behind, but caught-up and responsible for carrying on the current round of consensus. It is harder to be lazy in this case without acting maliciously. So the validator needs to receive and validate the
Proposal, issue aPrevote, then issue aPrecommit, to finally decide the height. To then start over from the next height.Summary
So, why is all this for?
A node that is lagging behind should focus on the pre-processing messages (from a future height or round) that are most important for achieving consensus with the minimal effort. This enables reducing substantially the backlog of pending messages that is one of the reasons for nodes lagging behind to never really catch-up with the majority of validators, in some cases.
Also, the lack of the required information for lazily progress in consensus should trigger the synchronization protocol to retrieve that information, following the priority and order above sketched. Because in the absence of failures, the only reason for a node not receiving this kind of information is that its backlog is so huge that it loses some information that is received by the majority of the nodes. So identifying and modeling this lazy approach should help the design of the anti-entropy/synchronization protocols.
This of course is a draft, an initial effort, and all feedback is welcome.
Beta Was this translation helpful? Give feedback.
All reactions