A Brief Introduction to Blockchain Consensus Algorithms

In blockchain, how to reach consensus among the untrustworthy nodes is a transformation of the Byzantine Generals (BG) Problem, which was raised in [1]. In BG problem, a group of generals who command a portion of the Byzantine army circle the city. Some generals prefer to attack while other generals prefer to retreat. However, the attack would fail if only part of the generals attacks the city. Thus, they have to reach an agreement to attack or retreat. How to reach a consensus in a distributed environment is a challenge. It is also a challenge for blockchain as the blockchain network is distributed. In blockchain, there is no central node that ensures ledgers on distributed nodes are all the same. Some protocols are needed to ensure ledgers in different nodes are consistent. We next present several common approaches to reach a consensus in blockchain followed by a brief comparison of those blockchain consensus algorithms.

A. Approaches to consensus

PoW (Proof of work) is a consensus strategy used in the Bitcoin network. In a decentralized network, someone has to be selected to record the transactions. The easiest way is random selection. However, random selection is vulnerable to attacks. So if a node wants to publish a block of transactions, a lot of work has to be done to prove that the node is not likely to attack the network. Generally, work means computer calculations. In PoW, each node of the network is calculating a hash value of the block header. The block header contains a nonce and miners would change the nonce frequently to get different hash values. The consensus requires that the calculated value must be equal to or smaller than a certain given value. When one node reaches the target value, it would broadcast the block to other nodes and all other nodes must mutually confirm the correctness of the hash value. If the block is validated, other miners would append this new block to their own blockchains. Nodes that calculate the hash values are called miners and the PoW procedure is called mining in Bitcoin.

In the decentralized network, valid blocks might be generated simultaneously when multiple nodes find suitable nonce nearly at the same time. As a result, other branches may be generated. However, it is unlikely that two competing forks will generate the next block simultaneously. In PoW protocol, a chain that becomes longer thereafter is judged as the authentic ones. Consider two forks created by simultaneously validated blocks U4 and B4. Miners keep mining their blocks until a longer branch is found. B4, B5 forms a longer chain, so the miners on U4 would switch to the longer branch. It should be noted that miners have to do a lot of computer calculations in PoW, which wastes too much energy and resources.

PoS (Proof of stake) is an energy-saving alternative to PoW. Miners in PoS have to prove the ownership of the amount of currency. It is believed that people with more currencies would be less likely to attack the network. The selection based on the account balance is quite unfair because the single richest person is bound to be dominant in the network. As a result, many solutions are proposed with the combination of the stake size to decide which one to forge the next block. In particular, Blackcoin [2] uses randomization to predict the next generator. It uses a formula that looks for the lowest hash value in combination with the size of the stake. Peercoin [3] favors coin age-based selection. In Peercoin, older and larger sets of coins have a greater probability of mining the next block. Compared to PoW, PoS saves more energy and is more effective. Unfortunately, as the mining cost is nearly zero, attacks might come as a consequence. Many blockchains adopt PoW at the beginning and transform to PoS gradually. For instance, Ethereum is planning to move from Ethash (a kind of PoW) [4] to Casper (a kind of PoS) [5].

PBFT (Practical byzantine fault tolerance) is a replication algorithm to tolerate byzantine faults [6]. Hyperledger Fabric [7] utilizes the PBFT as its consensus algorithm since PBFT could handle up to 1/3 malicious byzantine replicas. A new block is determined in a round. In each round, a primary would be selected according to some rules. And it is responsible for ordering the transaction. The whole process could be divided into three-phase: pre-prepared, prepared, and commit. In each phase, a node would enter the next phase if it has received votes from over 2/3 of all nodes. So PBFT requires that every node is known to the network. Like PBFT, Stellar Consensus Protocol (SCP) [8] is also a Byzantine agreement protocol. In PBFT, each node has to query other nodes while SCP gives participants the right to choose which set of other participants to believe. Based on PBFT, Antshares [9] has implemented its dBFT (delegated byzantine fault tolerance). In dBFT, some professional nodes are voted to record the transactions.

DPOS (Delegated proof of stake). The major difference between PoS and DPOS is that PoS is direct democratic while DPOS is representative democratic. Stakeholders elect their delegates to generate and validate blocks. With significantly fewer nodes to validate the block, the block could be confirmed quickly, leading to the quick confirmation of transactions. Meanwhile, the parameters of the network such as block size and block intervals could be tuned by delegates. Additionally, users need not worry about the dishonest delegates as they could be voted out easily. DPOS is the backbone of Bitshares [10].

Ripple [11] is a consensus algorithm that utilizes collectively-trusted subnetworks within the larger network. In the network, nodes are divided into two types: server for participating consensus process and client for only transferring funds. Each server has a Unique Node List (UNL). UNL is important to the server. When determining whether to put a transaction into the ledger, the server would query the nodes in UNL and if the received agreements have reached 80%, the transaction would be packed into the ledger. For a node, the ledger will remain correct as long as the percentage of faulty nodes in UNL is less than 20%.

Tendermint [12] is a byzantine consensus algorithm. A new block is determined in a round. A proposer would be selected to broadcast an unconfirmed block in this round. It could be divided into three steps: 1) the Pre-vote step. Validators choose whether to broadcast a prevote for the proposed block. 2) the Pre-commit step. If the node has received more than 2/3 of pre-votes on the proposed block, it broadcasts a pre-commit for that block. If the node has received over 2/3 of pre-commits, it enters the commit step. 3) Commit step. The node validates the block and broadcasts a commit for that block. if the node has received 2/3 of the commits, it accepts the block. In contrast to PBFT, nodes have to lock their coins to become validators. Once a validator is found to be dishonest, it would be punished.

B. Consensus algorithms comparison

Different consensus algorithms have different advantages and disadvantages. Table I gives a comparison between different consensus algorithms and we use the properties given by [13].

Node identity management. PBFT needs to know the identity of each miner in order to select a primary in every round while Tendermint needs to know the validators in order to select a proposer in each round. For PoW, PoS, DPOS, and Ripple, nodes could join the network freely.

Energy-saving. In PoW, miners hash the block header continuously to reach the target value. As a result, the amount of electricity required to process has reached an immense scale. As for PoS and DPOS, miners still have to hash the block header to search the target value but the work has been largely reduced as the search space is designed to be limited. As for PBFT, Ripple, and Tendermint, there is no mining in the consensus process. So it saves energy greatly.

Tolerated power of the adversary. Generally, 51% of hash power is regarded as the threshold for one to gain control of the network. But selfish mining strategy [14] in PoW systems could help miners to gain more revenue by only 25% of the hashing power. PBFT and Tendermint are designed to handle up to 1/3 faulty nodes. Ripple is proved to maintain correctness if the faulty nodes in a UNL are less than 20%.

Example. Bitcoin is based on PoW while Peercoin is a new peer-to-peer PoS cryptocurrency. Further, Hyper-ledger Fabric utilizes PBFT to reach consensus. Bitshares, a smart contract platform, adopts DPOS as their consensus algorithm. Ripple implements the Ripple protocol while Tendermint devises the Tendermint protocol.

PBFT and Tendermint are permissioned protocols. Node identities are expected to be known to the whole network, so they might be used in commercial mode rather than public. PoW and PoS are suitable for a public blockchain. Consortium or private blockchain might have a preference for PBFT, Tendermint, DPOS, and Ripple.

Reference:
[1] L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 4, no. 3, pp. 382–401, 1982.
[2] P. Vasin, “Blackcoins proof-of-stake protocol v2,” 2014. [Online]. Available: https://blackcoin.co/blackcoin-pos-protocol-v2-whitepaper.pdf
[3] S. King and S. Nadal, “Ppcoin: Peer-to-peer crypto-currency with proof-of-stake,” Self-Published Paper, August, vol. 19, 2012.
[4] G. Wood, “Ethereum: A secure decentralized generalized transaction ledger,” Ethereum Project Yellow Paper, 2014.
[5] V. Zamfir, “Introducing Casper the friendly ghost,” Ethereum Blog URL: https://blog. ethereum. org/2015/08/01/introducing-casper-friendly-ghost, 2015.
[6] C. Miguel and L. Barbara, “Practical byzantine fault tolerance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation, vol. 99, New Orleans, USA, 1999, pp. 173–186.
[7] “Hyperledger project,” 2015. [Online]. Available: https://www. hyperledger.org/
[8] D. Mazieres, “The stellar consensus protocol: A federated model for internet-level consensus,” Stellar Development Foundation, 2015.
[9] “Antshares digital assets for everyone,” 2016. [Online]. Available: https://www.antshares.org
[10] “Bitshares - your share in the decentralized exchange.” [Online]. Available: https://bitshares.org/
[11] D. Schwartz, N. Youngs, and A. Britto, “The ripple protocol consensus algorithm,” Ripple Labs Inc White Paper, vol. 5, 2014.
[12] J. Kwon, “Tendermint: Consensus without mining,” URL http://tendermint. com/docs/tendermint { } v04. pdf, 2014.
[13] M. Vukolic ́, “The quest for scalable blockchain fabric: Proof-of-work vs. bft replication,” in International Workshop on Open Problems in Network Security, Zurich, Switzerland, 2015, pp. 112–125.
[14] I. Eyal and E. G. Sirer, “Majority is not enough: Bitcoin mining is vulnerable,” in Proceedings of International Conference on Financial Cryptography and Data Security, Berlin, Heidelberg, 2014, pp. 436– 454.

4 Likes

I would say a better first way to categorize consensus protocols is first by the kind of network assumptions (synchronous, asynchronous, partially synchronous), and then by the kind of faults tolerated (byzantine vs crash vs omission, although all the ones summarized here are for byzantine faults). I like this series of blogposts explaining some of the differences

There are a few recent systemization of knowledge papers that offer good surveys and comparisons too
https://dl.acm.org/doi/pdf/10.1145/3318041.3355458

4 Likes

I would also add Avalanche (https://files.avalabs.org/papers/consensus.pdf) to the list using probabilistic BFT

6 Likes

I would like to add this decision tree which is helpful to choose appropriate consensus algorithms.

3 Likes

Neither Proof of Work nor Proof of Stake are consensus protocols. They’re Sybil protection. Outside of crypto, the most common form of Sybil protection is a Certificate Authority. When trying to have pseudonymous ad hoc participants, though, CA’s don’t work. That’s why there’s PoW and PoS.

The reason people get confused is because Nakamoto consensus (more accurately called Heaviest Chain consensus) requires it. The purpose of Proof of Work is akin to leader election for proposing a block, which of course, is never quite finalized but after a certain number of blocks is confirmed. This shows an investment of energy which prevents agents from flooding the network with blocks. It is, in essence, a way to prevent Sybil attacks. It has the side effect of building the blockchain itself.

Proof of Stake, again, is Sybil protection. It works in that an agent may not participate in the network without having invested in the network and staked that investment for a time period (ignore Algorand’s site, they’re switching to locked stake because ad hoc proof of value doesn’t work well… or so I’m told). It is more parallelizable and enables high-through put networks.

Nakamoto is a synchronous protocol. This means all nodes in the network must work in lockstep to progress the chain. Each new block that a node decides to build on is like a new heartbeat for the network to proceed. This is inherently capped in throughput, and is a major reason why people are trying to abandon this approach.

Asynchronous protocols do exist, but they’re not practical because nodes can wait from t=0 to t=infinity seconds for confirmation on a single transaction to be purely asynchronous. When people say “asynchronous protocol” they generally really mean “partially asynchronous” meaning that nodes wait from t=0 to t=n where n is some cap on wait time for a transaction to be confirmed. The value n can be a function, meaning that if you’re getting polled responses you can update your timeouts, which definitely keeps it from being fully synchronous.

Asynchronous protocols have the advantage of parallelization on their side. They can see new decisions coming at them at whatever speed the network’s liveness can handle.

Speaking of liveness, you left out the two most important concepts in consensus protocols: safety and liveness.

Most people know of safety thresholds, but don’t know the term. They also abuse it and flip values around so I’m going to give a succinct definition:

A protocol’s safety threshold is the maximum value of Byzantine agents a protocol can tolerate and still reliably achieve consensus on the network.

The safety threshold for Nakamoto is 50%. This means that if you want to conduct a coordinated Byzantine attack on the network, you must have over 50% of the voting power (hashpower in this case) to conduct the attack. This is most famously called a 51% attack, which is why people know what a safety threshold is without actually knowing the correct term. PBFT’s safety threshold is 33%, which means you need 34% control over voting power to conduct an attack.

My personal pet peeve is when people call safety by the term security. It’s a huge red flag that the person doesn’t know what they’re talking about.

Liveness is similar, but with the ability for progress to be halted on the network.

There’s many, many protocols out there, but there’s only three main classification of these protocols. In order of first discovery, they are:

  1. Classical
  2. Nakamoto
  3. Avalanche

I go over this in my article here, which I encourage you to read: https://medium.com/avalabs/avalanche-consensus-101-99c68a3e3159

The TL;DR is that Classical protocols always achieve consensus with P=1 (probability is 100%), Nakamoto uses heaviest chain to build a P=1-ϵ protocol that builds confidence as more blocks are added synchronously, and Avalanche is a asynchronous protocol (partially asynchronous in practice) which is P=1-ϵ which builds confidence of decisions on the node level before being locked in.

Now that we know about that, let’s talk about limitations. Classical is the most certain, but the most limited. The number of messages required on the network to achieve consensus on a single decision grows quadratically, or O(n^2) where n is the number of validators on the network. This can balloon really high and eventually slows down the entire network, which means there’s an artificial limit to the number of nodes directly voting in the network.

Nakamoto avoids this by doing a sort of leader election where someone discovers a hash via Proof of Work and then proposes a block. This block can be overturned later by a heavier block, which is why the protocol itself is probabilistic and requires confirmations. It has the advantage of enabling ad hoc participants and requires a lot of expensive hardware to do something errant, making it robust to liveness attacks. This does sacrifice safety for this liveness, though, which is not a great way to build reliability into applications built on the network.

Avalanche also enables ad hoc participants, but stresses safety over liveness. An attack on Avalanche is going to impact liveness well before safety. It does mean that liveness is susceptible to the same quality of attacks that, say, Google’s infrastructure is, however the attacker would be known and social consensus can right the network without any Byzantine behavior going through. Avalanche can get to 80%+ safety, but there’s a direct safety vs liveness tradeoff in this protocol such that it’s not very wise to go past 60% safety in a live adversarial network. On a local trusted network, though, it’s extremely awesome as this protocol is the fastest in history (if you’re willing to accept an error which is smaller than the odds of a hash collision in SHA256 LOL of course you are fine with that shushhhh). In fact when configured such that safety = liveness in Avalanche, they’re exactly 33% safety and liveness, meaning that the protocol is exactly a P=1 Classical asynchronous protocol. (it would also mean that the sample size k would be exactly equal to the number of validators n, so it would have n^2 message complexity…)

dPoS or “Delegated Proof of Stake” isn’t actually PoS in the strictest sense of the term. It’s a way of electing a group of individuals to become representatives for the token holders in the network. The reason dPoS exists is because Classical consensus protocols are limited by their message complexity.

There’s some other tidbits missing:

  • In Classical protocols, asynchronous protocols maximally achieve 33% safety threshold and synchronous can reach 50% safety threshold. These are known limits. Anyone claiming to exceed them need to be treated with extreme skepticism.
  • Probabilistic protocols can break safety thresholds but they need to accept some tolerable margin of error.
  • If you break your safety threshold, all assumptions are off in all protocols. There’s no “fixing” it unless there’s a complete view change and roll back. No amount of counter attestation will help. It’s broken. Done. Caput. This is why I say that if you’re not directly participating in voting, you’re not participating in consensus. This is also why delegation and committee selection appears to be inherently flawed.
  • The message complexity for Nakamoto is irrelevant, but it’s not safe.
  • The message complexity for Classical is in essence O(n^2) and is safe and live.
  • The message complexity for Avalanche is O(1) amortized … or O(log n) if you want to be super anal… and is safe but less live than safe.

I’m sure there’s other stuff. If it comes to me I’ll write it up.

5 Likes

You’re spot on that terms like “liveness” and “concurrency” are not used within the crypto space because not enough people are familiar with computer science or distributed computing taxonomy. This is one reason the forum is trying to converge these frameworks into a single location so the redundancy of crypto people trying to redefine things that already have definitions is reduced.

5 Likes