Scalable and Probabilistic Leaderless BFT Consensus through Metastability

Larry_Bates · January 22, 2021, 9:27pm

TLDR:

This paper introduces a group of leaderless Byzantine Fault Tolerant protocols called Snowball, Slush, and Snowflake.

The Snow protocols attempt to leverage a metastability layer to reduce input latency and increase throughput speed.

These protocols were designed to increase energy efficiency without compromising safety or liveness

They succeed in being more energy-efficient than existing solutions and offer promise for high throughput, quiescent PoS protocols, but more research needs to be done before a provable solution is found.

Core Research Question

Can we find an energy-efficient alternative to the Nakamoto consensus in the Snow protocol family without compromising safety or liveness?

Citation

T. Rocket, M. Yin, K. Sekniqi, R. van Renesse, and E. G. Sirer, “Scalable and Probabilistic Leaderless BFT Consensus through Metastability,” 24-Aug-2020. Available: [1906.08936] Scalable and Probabilistic Leaderless BFT Consensus through Metastability

Background

This paper analyzes multiple types of leaderless Byzantine Fault Tolerant consensus algorithms to ascertain each respective algorithm’s scalability through various dimensions that go beyond the scope of typical consensus analysis.
Bitcoin currently consumes roughly 63.49 TWh/year, nearly twice the consumption of Denmark. This excessive energy consumption is targeted as a problem to be solved by the Snow protocol family.
As Nakamoto consensus requires energy consumption even in an idle state, the Snow protocol family does not. Nakamoto Consensus requires the participation of miners to maintain security even when no decisions are being made.

Summary

The mechanisms introduced in the paper are constructed around metastability achieved through network subsampling.
Instead of the traditional linear bit processing approach, these models process logarithmic bits reducing the need for nodes to have as much information to maintain concurrency.
The Snow protocol family operates using an intentionally metastable layer that constantly samples the network to steer nodes towards the correct outcome.
Nakamoto Consensus takes a proportionate amount of time to finish a round the more adversarial nodes are attempting to break the consensus.
Flooding or Spamming Attacks are a problem in any system, which Avalanche addresses using transaction fees as a deterrent.
Adversarial Nodes can effectively run attacks between infinitesimally small intervals, as they are not bound by the latency associated with the responses from correct nodes. On the other hand, it cannot coordinate or schedule communications between correct nodes. Thus the researchers opt not to make assumptions about adversarial nodes’ behavior as they are computationally bounded even though they are informationally unbounded and round-adaptive.
Sybil Attacks occur when a single adversary can control a significant number of correct nodes within a network due to identity masking or occlusion. This paper concludes that the protocol can adopt any Proof of Stake strategy to protect against Sybil attacks instead of bake-in the Sybil resistance into the hardware layer, inherently making the process more labor-intensive and consuming energy.
Slush is the metastable layer that is the foundation for the Snow family. It is not Byzantine Fault Tolerant but is Crash Fault Tolerant. Slush starts in an uncolored state, and upon receiving a transaction from a client, the uncolored node updates itself to the color associated with the query. The protocol then sends a query to the network to get a random but constant sized (k) sample, which receives the color of the node in the query, to then initiate another query repeating the sampling process to then repeat another query until the node collects k responses for a total of m rounds until the node finally decides the color after running the chosen number of samples and rounds.
460×521 28.1 KB
Snowflake augments Slush by adopting a counter to indicate the confidence of a node’s conviction in its known color, in addition to showing the number of consecutive nodes to maintain the same color.
Snowball adds a confidence counter to record the number of queries that have surpassed the required threshold. This approach makes it unnecessary to keep track of the execution path through the DAG, as the confidence counter is the record of each point through the DAG.
Liveness is achieved in this protocol using a leaderless initialization mechanism which operates independently, assuming the network is synchronized.
Adding a DAG to Avalanche establishes an underlying structure that allows non-linear paths to be created between any points that have edges. This topology also makes it easier to assess conflict sets, as they are demonstrated in this image by the rounded gray regions.

Method

Using mathematical proofs to establish the basis for the protocols, the researchers used up to 2,000 virtual machine instances to prototype the proposed frameworks in a test net.
The researchers built prototype versions of the protocols in Avalanche. They then used the Bitcoin transaction schema as the data input to be analyzed by the Snow protocols to test for internal concurrency. The researchers simulated a constant flow of transactions from both users and adversaries to test the prototypes’ liveness.
As the researchers analyzed each protocol, the cryptographic bottleneck was analyzed to determine the difference in throughput in a system in which the cryptographic security was employed compared to the same protocols with the signature verifications disabled. Statistical analysis of the transactions per second at various node intervals was used to determine whether the number of nodes had a dramatic effect on throughput one way or the other.

Results

Running batches of 20-40 queries, Avalanche only degrades roughly 1.34% when scaling the nodes from 125 to 2000
Turning off the cryptographic signature check increased the TPS by almost 2.6x with a latency of roughly 0.4 seconds at most to confirm a transaction, revealing that cryptographic verification is the current bottleneck for this system.

Discussion & Key Takeaways

In comparing Avalanche to other projects like Algorand and Conflux, the Quorum-based Byzantine agreement found in Algorand and the DAG-enabled Nakamoto consensus in Conflux present fundamentally different approaches to achieving BFT in an attempt to improve on traditional Nakamoto consensus.
Although the researchers are confident the Snow protocols offer promise for quiescent PoS protocols, there is much more research that needs to be done in the area before a provable solution is found.

Implications & Follow-ups

This family of protocols offers an energy-efficient way to monitor states when no transactions are taking place. In PoW-based protocols, The need to have miners operating constantly to maintain concurrency becomes a long-term pitfall for maintaining data.
The Snow protocols offer an energy-efficient leaderless Byzantine Fault-tolerant framework conducive to higher throughput and gives the option to disable cryptographic signatures to increase throughput.
The tests show that the protocols scale with a limited reduction in transactions per second regardless of how many nodes are operating. The asynchronous nature of the protocols reduces the capacity for bottlenecks to occur as the network scales.

Applicability

This novel set of protocols will likely have a better fit in cases where data needs to be preserved on a distributed network, but does not need to be constantly updated even though the Avalanche protocol could handle many simultaneous queries.

Sean1992076 · January 29, 2021, 6:52am

I have a question about this uncolored consensus area in the metastable layer. Is the slush means that continue receiving transaction and lockup for a period of time? If two of the nodes sluch at the same time, which one is the broadcast leader at this time?

Larry_Bates · January 29, 2021, 2:40pm

That is a great question! If I read the paper correctly, the sampling is random and thus “leaderless”. This is how it prevents collusion between nodes such that the transaction senders and receivers never know which nodes will be sampled for colors, thus it makes it near impossible to collude to trick nodes.

Sean1992076 · January 30, 2021, 4:24am

It totally makes sense! And it is fair enough to select random nodes as the leader. Thanks for your clear point.

collincusce · June 10, 2021, 12:03am

No leaders. Each node comes independently to their own decision about what the rest of the network is deciding on.

collincusce · June 10, 2021, 12:05am

Why? I don’t agree. Updating data is the same as writing data, and writing data is just a transaction. It shouldn’t matter.

Larry_Bates · June 10, 2021, 2:18am

In that updating a google spreadsheet would become ridiculously costly compared to a free distributed spreadsheet if it was run on a blockchain whether the transactions were cheap or not. It’s just not practical to expect businesses to implement blockchains as the basis for data tracking when distributed spreadsheets will suffice.

It does matter when the cost of a transaction using a google doc is $0 when using any blockchain-based data tracking in which transactions incur a cost will be $0<.

It’s just not practical to ignore the cost.

Cost of recording a single transaction will definitely matter to companies that have high volumes of transactions that could be recorded in a spreadsheet for $0 for recording the transaction. It wouldn’t be “immutable”, but they wouldn’t need it to be or else they would pay for immutability in a blockchain.

When standards like HIPAA and Sarbanes-Oxley demand records be destroyed after a certain amount of time, costly immutable records will not be preferable. This is why the space has to be cognizant of regulations that will make some technologies unusable in certain settings.

Just as HIPAA lays out a timeline for the destruction of medical records, they specify the types of records that must be destroyed. These include patient charts and medical records as well as any other patient health information (PHI) that includes personal and confidential data.

Here’s the list of information that must be properly secured and destroyed under HIPAA regulations:

Patient names
Dates (birth dates and other relevant dates)
Geographic identifiers
Social Security numbers
Phone numbers
Fax numbers
Email address
Medical record numbers
Health plan beneficiary numbers
Account numbers
Biometric identifiers, including retinal scans or fingerprints
Full face photos and comparable images
Certificate and license numbers
Device identifiers and serial numbers
Vehicle identifiers and serial numbers, including license plate numbers
Web URLs
IP addresses
Other unique identifying numbers, characteristics, or codes

So imagine trying to build a protocol that uses “immutability” while having to keep all of the above information “mutable”.

Businesses are not going to pay for technology that makes destroying records impossible in cases where they are legally mandated to destroy records.

This is where a protocol like those found in the Snow family would be good for pushing batch transactions for an individual organization, but trying to use it for individual transactions would still be too costly to be a practical solution for a “business” that could use a free and mutable alternative. Obviously for tracking a network of users transacting currency p2p, this is going to be a superior method than a non byzantine-fault tolerant approach, but that is where the use-case is limited to specifically decentralized financial transactions and wouldn’t be preferable in orthodox business settings.

collincusce · June 10, 2021, 2:38am

…

…most of what you said is filed under “no duh” and I didn’t expect it to be considered relevant.

Also, nothing is immutable. You can always edit the data. If you’re concerned about transaction history, it’s not required to be kept using Avalanche. However, it can be kept by any participating node, so if you need privacy, build a network with private membership.

Again, “no duh”.

I fail to see why anyone would consider using any BFT consensus protocol for the examples you provided. It’s as true for Avalanche as any other of its kind.

With the statement now clarified, I don’t think the part I quoted was very pertinent or useful, and this long response is even less so.

Larry_Bates · June 10, 2021, 2:39am

You said you disagreed with my assessment of proper use-cases.

I reinforced why they would not fit in many use cases.

For the record, that assessment is the conclusion drawn by the researcher on the original full work. If the work has changed since then and the use-cases have changed, that would be something to cover.

collincusce · June 10, 2021, 3:00am

No no no no no…

You said:

This novel set of protocols will likely have a better fit in cases where data needs to be preserved on a distributed network, but does not need to be constantly updated even though the Avalanche protocol could handle many simultaneous queries.

Which I disagreed with. You can constantly update the database being agreed on.

“Updating data is the same as writing data, and writing data is just a transaction. It shouldn’t matter.”

That’s what happened. I still don’t agree with the statement made. The intent is to write to the state of a database (no duh).

You then went on a tear about some random HIPPA nonsense that was flat-out not relevant.

I don’t think your assessment matches Gün, Kevin, Ted, etc’s conclusion in their paper or any intent of this (or any other) consensus protocol. But what do I know. I’ll ask Gun or Kevin or Stephen or Ted next time I talk to them. ;-)

Rich · June 10, 2021, 3:37am

Hi @collincusce.

Welcome back to the forum. Pleased to see a spirited discussion but phrases like ‘no, duh’, characterizing what appears to be a good-faith response as a ‘tear’ and ‘nonsense’, then closing with an appeal to authority doesn’t further the conversation. I’d like to ask you to keep the conversation civil please.

Harvesto · August 21, 2022, 9:54am

Exciting summary, Larry.

I am just wondering whether Bitcoin can successfully adopt the consensus mechanism. And if it can, what positive/negative effect would it have on scalability and settlement time?

I like that the Snow protocols are more energy-efficient than existing solutions, but that is all theoretical, right? It would be interesting to see the protocols solving consensus problems soon.

Larry_Bates · August 21, 2022, 2:15pm

Thanks for the question! Based on the Taproot deployment, I would assume adopting this type of protocol (if possible) would be something that be more an issue of consensus once there was a viable means of shifting the protocol. As far as I know, the Snow protocols were being tested in a virtual machine environment for the sake of showing throughput. Although the timing of the original post was during the testing phases, the Snow protocols have been deployed on Avalanche’s main net.

Harvesto · August 22, 2022, 5:55am

Alright!

That makes sense.

Concerning the Snow protocols’ current implementation on Avalanche, I had no idea.

Although, I guess I shouldn’t be too surprised. The names and their linkage to each other are obvious (snow, avalanche).

Also, I read about what makes Avalanche’s settlement fast which is it’s metastability and leaderless BFT consensus.

Pretty good work from the authors

Humphery · December 18, 2022, 5:44am

@Larry_Bates It’s extremely great how you describe a hard issue in a simple way.

It is possible to find an energy-efficient alternative to the Nakamoto consensus, which is used by many blockchain systems such as Bitcoin, without compromising safety or liveness. One example of such an alternative is the Snow protocol family, which includes protocols such as Snowflake and Snowstorm.

The Snow protocol family is based on proof-of-stake (PoS) consensus, which is a sort of consensus mechanism that allows network participants to confirm transactions and add new blocks to the blockchain by staking their own cash. This is in contrast to proof-of-work (PoW) consensus, which is utilized by the Nakamoto consensus and requires network participants to execute difficult computations in order to validate transactions and add new blocks to the blockchain.

Because the Snow protocol family relies on PoS consensus rather than PoW consensus, it consumes less energy than the Nakamoto consensus. This is because PoS does not require network participants to do sophisticated computations, which can be energy-intensive, in order to validate transactions and add new blocks to the blockchain.

However, because the Snow protocol family has not yet been widely adopted or tested in commercial situations, it is unclear how well it will work in practice. It is also likely that there are other trade-offs or potential difficulties with employing the Snow protocol family that are not yet completely recognized.

Topic		Replies	Views
Research Summary: Dissecting Tendermint Consensus summary , consensus , scalability , bft	6	1456	December 1, 2022
Research Pulse #91 11/15/2022 Research Pulse	0	534	November 15, 2022
Research Pulse #62 04/25/22 Research Pulse	0	940	April 25, 2022
Research Pulse Issue #19 06/28/21 Research Pulse	1	870	June 29, 2021
Research Pulse #96 12/19/2022 Research Pulse	0	517	December 20, 2022