Scalable and Probabilistic Leaderless BFT Consensus through Metastability

TLDR:

  • This paper introduces a group of leaderless Byzantine Fault Tolerant protocols called Snowball, Slush, and Snowflake.
  • The Snow protocols attempt to leverage a metastability layer to reduce input latency and increase throughput speed.
  • These protocols were designed to increase energy efficiency without compromising safety or liveness
  • They succeed in being more energy-efficient than existing solutions and offer promise for high throughput, quiescent PoS protocols, but more research needs to be done before a provable solution is found.

Core Research Question

  • Can we find an energy-efficient alternative to the Nakamoto consensus in the Snow protocol family without compromising safety or liveness?

Citation

Background

  • This paper analyzes multiple types of leaderless Byzantine Fault Tolerant consensus algorithms to ascertain each respective algorithm’s scalability through various dimensions that go beyond the scope of typical consensus analysis.
  • Bitcoin currently consumes roughly 63.49 TWh/year, nearly twice the consumption of Denmark. This excessive energy consumption is targeted as a problem to be solved by the Snow protocol family.
  • As Nakamoto consensus requires energy consumption even in an idle state, the Snow protocol family does not. Nakamoto Consensus requires the participation of miners to maintain security even when no decisions are being made.

Summary

  • The mechanisms introduced in the paper are constructed around metastability achieved through network subsampling.
  • Instead of the traditional linear bit processing approach, these models process logarithmic bits reducing the need for nodes to have as much information to maintain concurrency.
  • The Snow protocol family operates using an intentionally metastable layer that constantly samples the network to steer nodes towards the correct outcome.
  • Nakamoto Consensus takes a proportionate amount of time to finish a round the more adversarial nodes are attempting to break the consensus.
  • Flooding or Spamming Attacks are a problem in any system, which Avalanche addresses using transaction fees as a deterrent.
  • Adversarial Nodes can effectively run attacks between infinitesimally small intervals, as they are not bound by the latency associated with the responses from correct nodes. On the other hand, it cannot coordinate or schedule communications between correct nodes. Thus the researchers opt not to make assumptions about adversarial nodes’ behavior as they are computationally bounded even though they are informationally unbounded and round-adaptive.
  • Sybil Attacks occur when a single adversary can control a significant number of correct nodes within a network due to identity masking or occlusion. This paper concludes that the protocol can adopt any Proof of Stake strategy to protect against Sybil attacks instead of bake-in the Sybil resistance into the hardware layer, inherently making the process more labor-intensive and consuming energy.
  • Slush is the metastable layer that is the foundation for the Snow family. It is not Byzantine Fault Tolerant but is Crash Fault Tolerant. Slush starts in an uncolored state, and upon receiving a transaction from a client, the uncolored node updates itself to the color associated with the query. The protocol then sends a query to the network to get a random but constant sized (k) sample, which receives the color of the node in the query, to then initiate another query repeating the sampling process to then repeat another query until the node collects k responses for a total of m rounds until the node finally decides the color after running the chosen number of samples and rounds.
  • Snowflake augments Slush by adopting a counter to indicate the confidence of a node’s conviction in its known color, in addition to showing the number of consecutive nodes to maintain the same color.
  • Snowball adds a confidence counter to record the number of queries that have surpassed the required threshold. This approach makes it unnecessary to keep track of the execution path through the DAG, as the confidence counter is the record of each point through the DAG.
  • Liveness is achieved in this protocol using a leaderless initialization mechanism which operates independently, assuming the network is synchronized.
  • Adding a DAG to Avalanche establishes an underlying structure that allows non-linear paths to be created between any points that have edges. This topology also makes it easier to assess conflict sets, as they are demonstrated in this image by the rounded gray regions.

Method

  • Using mathematical proofs to establish the basis for the protocols, the researchers used up to 2,000 virtual machine instances to prototype the proposed frameworks in a test net.
  • The researchers built prototype versions of the protocols in Avalanche. They then used the Bitcoin transaction schema as the data input to be analyzed by the Snow protocols to test for internal concurrency. The researchers simulated a constant flow of transactions from both users and adversaries to test the prototypes’ liveness.
  • As the researchers analyzed each protocol, the cryptographic bottleneck was analyzed to determine the difference in throughput in a system in which the cryptographic security was employed compared to the same protocols with the signature verifications disabled. Statistical analysis of the transactions per second at various node intervals was used to determine whether the number of nodes had a dramatic effect on throughput one way or the other.

Results

  • Running batches of 20-40 queries, Avalanche only degrades roughly 1.34% when scaling the nodes from 125 to 2000
  • Turning off the cryptographic signature check increased the TPS by almost 2.6x with a latency of roughly 0.4 seconds at most to confirm a transaction, revealing that cryptographic verification is the current bottleneck for this system.

Discussion & Key Takeaways

  • In comparing Avalanche to other projects like Algorand and Conflux, the Quorum-based Byzantine agreement found in Algorand and the DAG-enabled Nakamoto consensus in Conflux present fundamentally different approaches to achieving BFT in an attempt to improve on traditional Nakamoto consensus.

  • Although the researchers are confident the Snow protocols offer promise for quiescent PoS protocols, there is much more research that needs to be done in the area before a provable solution is found.

Implications & Follow-ups

  • This family of protocols offers an energy-efficient way to monitor states when no transactions are taking place. In PoW-based protocols, The need to have miners operating constantly to maintain concurrency becomes a long-term pitfall for maintaining data.
  • The Snow protocols offer an energy-efficient leaderless Byzantine Fault-tolerant framework conducive to higher throughput and gives the option to disable cryptographic signatures to increase throughput.
  • The tests show that the protocols scale with a limited reduction in transactions per second regardless of how many nodes are operating. The asynchronous nature of the protocols reduces the capacity for bottlenecks to occur as the network scales.

Applicability

  • This novel set of protocols will likely have a better fit in cases where data needs to be preserved on a distributed network, but does not need to be constantly updated even though the Avalanche protocol could handle many simultaneous queries.
1 Like

I have a question about this uncolored consensus area in the metastable layer. Is the slush means that continue receiving transaction and lockup for a period of time? If two of the nodes sluch at the same time, which one is the broadcast leader at this time?

1 Like

That is a great question! If I read the paper correctly, the sampling is random and thus “leaderless”. This is how it prevents collusion between nodes such that the transaction senders and receivers never know which nodes will be sampled for colors, thus it makes it near impossible to collude to trick nodes.

2 Likes

It totally makes sense! And it is fair enough to select random nodes as the leader. Thanks for your clear point.

2 Likes