Ripple Protocol Research Summary

TLDR:

  • Researchers assessed the Ripple source code to determine the safety of the consensus protocols

  • They determined that Ripple protocol is not Byzantine Fault Tolerant or Crash Fault Tolerant

  • While no Byzantine nodes have been observed by the researchers, they suggest that platform users of the platform operate with caution given the current known vulnerabilities.

Citation

Background

  • Ripple (XRP) was launched in 2012. They opted for a lower-latency algorithm to increase transaction speed at the expense of Byzantine Fault Tolerance.
  • Audit released November 30, 2020

Research Question

  • Can the researchers analyze the general security of the XRP Consensus mechanism?
  • Can the researchers assess safety concerns in real-world settings without testing Ripple’s initial statements about the protocol?

Scope

  • The audit analyzes XRP source code. Additionally, the audit analyzes the network to determine whether nodes can be compromised and what types of violations may occur within this framework.

Type

  • Code Audit

Summary

  • The audit begins by addressing the framework in which the Ripple protocol was deployed. In opting not to achieve Byzantine fault-tolerant consensus, Ripple protocol chose to employ a Unique Node List (UNL) structure to establish the trusted validators.


Fig 1.

  • Figure 1 gives an example of two UNLs where the white nodes (1,2,3) trust UNL1, and the black nodes (4,5,6) trust UNL2. Nodes 3 and 4 have more influence than the other nodes as they are trusted by both UNL 1 & 2
  • It was initially theorized that the UNL structure would need an 80% quorum between nodes before consensus was declared. However, there has not been enough research on Ripple to validate this theory.
  • One paper found that the theorized 20% threshold for overlapped UNL nodes would not be sufficient to establish consensus, and at least a 40% threshold was suggested for overlap nodes.
  • Another paper found that the 80% threshold for consensus would likely be too low for the UNL structure and that it may need to be above 90% to achieve liveness and safety.
  • The consensus in the protocol is determined by each node executing the sequences in which transactions are received globally as if that sequence is the proper order in time, in a manner that has been called atomic broadcast.
  • Atomic broadcast is highly synchronous and relies on a common notion of time across nodes.
  • Each round of transactions is unified into a ledger, structured into rounds that have an “open” phase as the start of the round, establish as the phase of the round in which a proposal is processed, and “accepted” phase the close of a round.

Methodology

  • The researchers established an unsafe scenario in which a Byzantine node could be validated as “correct” when two nodes execute different transactions. The following figure shows node 4r as the Byzantine node, which shares connections to Unique Node List 1 and 2. This configuration creates a conflict in that two conflicting transactions executed at roughly the same time would both be accepted due to the protocol accepting a UNL transaction if enough nodes pass the threshold. L and L­1 are presented as simultaneous transactions that would both be accepted by the Byzantine node (4), even under circumstances where the transactions are conflicting which violates consensus.
    .
  • They expand the previous scenario to represent an arbitrary number of nodes that reach an equivalent number of validators on each perspective UNL. Suppose a Byzantine node (f) is sent two transactions simultaneously by UNLtx or by UNLtx1. In that case, node f will behave as if both UNL transactions are valid, as they both will show to have over 80% agreement on their respective UNL which could potentially violate consensus.
  • If a single UNL has 2n + 1 nodes, and a single Byzantine node, there are multiple ways the protocol can break down to prevent consensus. In this scenario, if two conflicting transactions are sent from the Byzantine node simultaneously, the gossip protocol will force nodes to return a dispute as one set of nodes would receive the transaction for the wrong group and vice versa. As disputes mount and the transaction requests continue to pile up, the transaction is eventually rejected, as every attempt to achieve consensus will not achieve the 80% threshold necessary.
  • This could create an infinite loop where the ledger is continuously trying to update itself to find the correct values, and thus liveness cannot be guaranteed.

Results

  • The vulnerabilities discovered in this paper are high risk, they present scenarios in which both safety and liveness are potentially violated.
  • In using abstracted models of the protocol, the researchers created simple scenarios that would violate safety and liveness, resulting in devastating effects for the network’s health.
  • The researchers propose a need for close synchronization, tight interconnection, and fault-free operations to reduce the potential for the vulnerabilities discovered in this paper to occur.

Discussion & Key Takeaways

  • The Ripple blockchain is one of the oldest in existence, and has been believed to be Byzantine Fault-Tolerant for the duration.

  • The lack of complete transparency in all UNL validating nodes makes it impossible to determine if the system has been operating without Byzantine nodes corrupting information.

  • Previous works have also raised concerns about the safety and liveness of the protocol.

Implications & Follow-ups

  • The network needs tightly synchronized nodes in order to maintain concurrency.
  • This work identified relatively simple cases which could violate safety and/or liveness of the network. Further research into potential scenarios in which a catastrophic-failure could occur is also suggested.

Applicability

  • This work should be used by researchers looking to understand the fitness of the XRP consensus protocol.
  • It could be used to understand and improve XRP protocol.
  • It provides a useful foundation for assessing private blockchains, including creating pseudo-code based on the actual codebase to establish scenarios that could occur since the actual blockchain is not fully transparent at each validating node.
4 Likes

@Larry_Bates it looks like David Schwartz wrote a tweet thread [here] discussing some of the general findings of the audit. One of the more interesting claims he makes is:

Screen Shot 2021-02-11 at 11.46.05 AM

I’m curious after reading through the audit what your thoughts on are this claim and the implication, not only for the XRP consensus protocol but the idea in general, and how it could impact other protocol designs in the future?

In a way, we can summarize the issue as this:
it is usually hard in practice to know where to act to split the network with a fork. It requires the attacker to know precisely where to send block1 and block2 so that the nodes get stuck in their own version of history. Here the node trusted are explicit. So it is easier to find yourself in the centre of those node and perform the attack. All nodes are not created equals in this setting.

Note that there was a consensus failure in Stellar, which was first a fork of Ripple consensus, as explained by Stellar blog post Safety, liveness and fault tolerance—the consensus choices.
This prompted Stellar to change their consensus:

  • David Mazières , The Stellar Consensus Protocol: A Federated Model for Internet-level Consensus (2015): link

Note that, looking through the citations of this paper on Ripple, I found that the authors do not cite the previous paper but instead, the following:

  • G. Losa, E. Gafni, and D. Mazières, “Stellar consensus by instantiation", pdf

@Larry_Bates just saw that the authors made their own summary and addressed the replies from Ripple:

It seems like the Ripple developers had underestimated the needed overlapping UNLs to preserve safety by over half. It is slightly concerning that the Ripple team has seemingly brushed off the vulnerability.