This document aims to collect the major differences and discuss the main divergences between the BFT consensus algorithm adopted by CometBFT and its actual implementation as part of CometBFT's codebase.
This is a work in progress.
The reference for the BFT consensus algorithm is the paper The latest gossip on BFT consensus, which describes the Tendermint consensus algorithm.
The reference for the implementation of the consensus protocol is the
CometBFT repository,
primarily the consensus package.
This includes the
specification
of the consensus implementation, although the content of this document needs to
be properly validated and updated.
This investigation should, as far as possible, be restricted to highlight differences between the BFT consensus algorithm and the corresponding implementation of the core consensus protocol. Are not considered as part of the core consensus protocol, and therefore are not expected to be covered by this document:
- The consensus gossip protocol: not specified in the algorithm, a complex protocol in the implementation
- The Application-Blockchain interface (ABCI) and the interactions with the replicated application in general
- The proposer election algorithm, which is assumed to be an external module
A non-comprehensive and evolving list of divergences between the algorithm and the implementation:
- Block propagation: the implemented
PROPOSALmessage does not carry the full proposed value. The proposed block is transmitted using a variable number ofBlockPartmessages. - Equivocating proposals: the algorithm considers
that multiple valid
PROPOSALmessages can be received in a round of consensus. The implementation only considers the first validProposalreceived. - Round skipping: the conditions for skipping to a higher round in the implementation are more restrictive than in the algorithm. The limited tracking of votes from future rounds is included in this discussion.
- Commit phase: the actions taken when a block is decided are more complex in the implementation, while the algorithm essentially sets the decision value and moves to the next height.
- Last commit: the consensus protocol collects as much
PRECOMMITvotes as possible for a decided block. This operation includes thetimeout_commitlogic and of theNewHeightround step. - Validators set: the algorithm assumes a fixed number of processes with uniform voting power. In the implementation, the set of voting processes (validators) and their voting power are dynamic.
Potential divergences and specificities to be considered:
- Locked value as proposed value (i.e., deciding without a
PROPOSALmessage) - Empty blocks handling (except if it can be encapsulated on the
getValue()logic)
The algorithm considers three message types:
- A
PROPOSALmessage carries the proposed valuev, a full value PREVOTEandPRECOMMITmessages carry a referenceid(v)to the voted (full) valuev
In the implementation, a proposed value is a block to be appended to the blockchain, more specifically:
- The proposed value
vis always aBlock - The reference for a proposed value
id(v)is aBlockID
The implementation considers a different set of message types:
Proposalmessage: similar to the algorithm'sPROPOSALmessage, with the difference that it carries aBlockID(equivalent toid(v)), not the full proposed value (equivalent tov)BlockPartmessages: carry the full proposed value (equivalent tov), split into multipleParts of aBlock. Together with theProposalmessage, plays the role of the algorithm'sPROPOSALmessage.Votemessages: corresponds to the algorithm'sPREVOTEandPRECOMMITmessages, carries aBlockID
In other words, the full proposed Block is not carried by the Proposal
message.
The Proposal message carries a BlockID while a variable number of
BlockPart messages are responsible for transporting the full proposed
Block, uniquely identified by the BlockID carried by the Proposal
message.
The BlockID type has two roles in the consensus implementation:
- It uniquely identifies a
Block, by storing its hash (effectivelyid(v)) - It carries the root of a Merkle tree used to propagate the
Blockusing multipleBlockPartmessages
The consensus implementation thus implements the propose step as follows:
- A
Proposalmessage is received and a Merkle tree is initialized from the proposedBlockID - A number of
BlockPartmessages are received, and the corresponding blockParts are added to the initialized Merkle Tree; the full proposed block is being progressively computed. - Once all
Parts of aBlockare received, and validated against the Merkle tree root, theProposalBlockfield is computed so that to store the proposedBlock.
As a result, the combination of the Proposal and ProposalBlock block
implementation fields corresponds to receiving of a PROPOSAL message in the
algorithm, proposing a value v == ProposalBlock.
References:
- tendermint/tendermint#7946 Treat proposal and block parts explicitly in the spec
- tendermint/tendermint#9504 consensus: Proposal should not include block ID
- tendermint/tendermint#7922 Proposal to Reform BlockID and Block Propagation
A correct node that is the proposer of a round is assumed to produce a value
v and broadcast a PROPOSAL carrying the proposed value v.
Byzantine proposers, however, may produce multiple proposed values and
broadcast distinct PROPOSAL messages in the same round.
This constitutes an equivocation.
A correct node should act upon the first valid PROPOSAL message received in
a round.
The action taken is the broadcast of a PREVOTE message, either for the
proposed value id(v) or for nil.
From this point, if further PROPOSAL messages are received, a correct node
should not take any action, as might entail broadcasting a different PREVOTE
message in the same round, which would configure an equivocation.
However, a correct node that acted upon a PROPOSAL message proposing v can
still broadcast a PRECOMMIT message for, or decide a different value v',
provided it receives a PROPOSAL message proposing v' and the required
number of PREVOTE (line 36) or PRECOMMIT (line 49) messages for id(v').
In other words, while only the first valid PROPOSAL message received enables
the upon clauses of lines 22 and 28, any PROPOSAL message should be
considered when evaluating the conditions of lines 36 and 49.
In practical terms, this means that a process should store and consider any
valid PROPOSAL message received in a round, even though only the first one
enables the transition from the propose to the prevote steps.
In the consensus implementation, the PROPOSAL message considered by the
algorithm is a combination of a Proposal message with a number of
BlockPart messages, as discussed in block propagation.
In the implementation, a node only considers and stores a single Proposal
message per round, the first valid one received, rejecting any additional
Proposal message it receives.
The Proposal message has the role of defining which BlockPart messages the
node should accept, therefore which Block it is expected to build.
This means that once accepting a Proposal message, the node will also reject
BlockPart messages that do not match that Proposal, therefore it will not
build possible other Blocks proposed in the same round.
There are exceptions for this role: under some conditions, to be discussed in the following, a node starts accepting
BlockPartmessages for aBlockthat does not matching the acceptedProposal.
In short,
TODO:
TODO:
TODO: