Research Summary: PERIMETER: A network-layer attack on the anonymity of cryptocurrencies

TLDR:

  • This paper proposes PERIMETER, a passive network-layer attack that de-anonymizes transactions on Bitcoin and Ethereum with 90% accuracy by intercepting 50% of connections.
  • Countermeasures against this type of attack are also discussed.

Citation

  • Apostolaki, Maria et al. “PERIMETER: A network-layer attack on the anonymity of cryptocurrencies.” (2020).

Link

Core Research Question

  • Can we map transactions to IP addresses without being detected?

Background

  • Nodes on the blockchain are responsible for broadcasting transactions to peers. Transactions can be de-anonymized by discovering their nodes’ IP addresses.
  • Transactions need to be propagated in a network, verified by all nodes, and added to the blockchain. The three types of message sent during the propagation of a transaction are:
    • inv: When a node A receives a new transaction, “inv” along with the transaction hash will be sent to its peers (node B, C).
    • getdata: If the peer (node B, C) receives the “inv” message and it has not seen the transaction, “getdata” and the transaction hash will be sent back to node A to request for the raw transaction.
    • tx: When node A receives the “getdata” message, it will reply with the “tx” message which sends the transaction to the peers (node B, C) that requested it.
  • Supernode is a seemingly regular node that connects to all active nodes and listens to the transaction traffic they relay. It is the traditional method to map transactions to IP addresses. However, it is highly noticeable as supernodes establish many new connections to every reachable client.
  • Diffusion is a mechanism in Bitcoin Core against deanonymization where a client broadcasts transactions with delay to its peers. In contrast, transactions are broadcasted without delay in Ethereum.
  • Autonomous System (AS) is a large network or group of networks that have a unified routing policy. Every device that connects to the Internet is connected to an AS. Every AS controls a specific set of IP addresses.
  • Internet eXchange Point (IXP) is a physical location through which networks such as Internet Service Providers (ISPs), CDNs, web enterprises, communication service providers, cloud and SaaS providers connect to exchange Internet traffic.
  • Border Gateway Protocol (BGP) regulates how IP packets are forwarded on the Internet. BGP finds the fastest paths from AS to AS (called AS-path) by storing which IP address belongs to which AS. Both ASes and IXPes forward traffic in an AS-path, so they can eavesdrop, drop, or delay it.
  • A victim’s anonymity set is a set of transactions that contains as many as possible the victim’s transactions and as few as possible transactions created by others.

Summary

  • The authors propose PERIMETER, a network-level attack with the following properties:
    • Hard to detect
      • It is completely passive and does not establish new connections like using supernodes.
    • Hard to mitigate
      • The attacker’s power is dependent on the Internet routing protocol (BGP), not the application protocol.
  • PERIMETER is composed of two phases:
    • Eavesdropping on the victim’s connections and read the packet’s payload to collect information about the victim’s transactions
      • PERIMETER only needs to control a portion of the victim’s connections.
    • Distinguish the victim’s transactions with anomaly detection
      • The victim’s transactions have an abnormal propagation pattern. For example, the victim will send a transaction it generated to an unusually high portion of its peers compared to other transactions.
      • PERIMETER uses the victim’s interactions with its peers to infer whether the victim knew a transaction before its peers.
  • PERIMETER can be generalized to other cryptocurrencies such as Ethereum because:
    • There are four adversaries intercepting 30% of the majority of Ethereum client connections.
    • A network adversary can infer the victim’s peers by eavesdropping on the IP packets since their header is unencrypted.
  • PERIMETER’s mechanism in detail:
    • Attacker’s goal: Map the IP of a victim node to the transactions it created
    • Attacker’s profile: An AS or IXP that intercepts the victim’s connections and knows the victim’s IP
  • Attack scenario on Bitcoin:

  • A2 is the attacker who aims at mapping node A to the transactions it generates.

  • AS2 knows the IP address of node A and is connected to the surrounding nodes.

  • AS2 eavesdrops on node A’s connections to create the initial anonymity set.

  • AS2 reduces the size of the anonymity set by removing transactions that are most likely not generated by node A.

    • For example, a node only requests and receives transactions it does not know already, so AS2 can remove the transactions sent from other nodes and received by node A.
  • For transactions propagated through the nodes that AS2 does not intercept, AS2 maps the transaction to node A if the number of peers that requested the transaction from node A is higher than the others.

  • Attack scenario on Ethereum:

    • AS2 can infer the IP addresses of the victim’s peers by reading the unencrypted headers of the packets that node A sends and receives.
    • AS2 can connect to some of the victim’s peers and surround the victim to intercept the propagation of transactions, as in the above Bitcoin scenario.
    • Unlike Bitcoin where no new connections are needed, new connections in Ethereum are needed.
  • How PERIMETER recognizes Bitcoin traffic:

    • The adversary surrounds the victim’s node and distinguishes traffic with TCP port 8333.
    • If the node uses another TCP port, the adversary can search for known Bitcoin messages “inv” or “getdata”.
    • After the adversary finds a Bitcoin message in a packet, he or she can find the IP address by matching the IP format.
  • How PERIMETER creates the initial anonymity set:

    • Challenge: Bitcoin messages can be split among multiple packets and those packets can be re-ordered, lost, and retransmitted.
    • Solution: Reconstruct the message stream using GoPacket.
    • The adversary then calculates the number of “inv”,“getdata” and “tx” messages that are sent and received per transaction.
  • How PERIMETER analyzes which transactions are created by the victim:

    • Challenge: (1) The number of transactions the victim propagates is much higher than those that it created. (2) No training data.
    • Solution: Use unsupervised anomaly detection, where the anomaly is the transaction that the victim created. For instance, the victim will propagate the transaction it generates to more peers compared to other transactions.
    • Machine learning model: Isolation Forest (IF)
      • Advantage:
        • More computationally efficient than distance-based methods, including nearest-neighbor and clustering-based approaches
        • Scale on large dataset
        • Suitable for real-time online applications
        • Not too sensitive to parameter tuning
      • Procedure:
        • Identify anomalies by isolating outliers
        • Build an ensemble of decision trees to partition the data points
        • Since anomalies are easier to isolate, fewer splits are needed
  • The features that PERIMETER selects to perform anomaly detection

    • The number of “getdata” messages received per transaction
      • Reason
        • “getdata” will be sent if a client has not received a transaction
        • A node will receive more “getdata” for a transaction it created
    • The number of “tx” messages received per transaction
      • Reason
        • If a node received a transaction from its peers, the node could not have created it
    • The portion of clients requesting a transaction from the victim compared to others
      • Reason
        • Due to diffusion, the victim might delay sending its transaction to other peers so much that they learn it from other peers.
        • Although the propagator of a transaction may not be the creator, the victim will receive more transaction requests than other peers.
  • The authors discuss the countermeasures for the design of cryptocurrencies against this attack:

    • Encrypt traffic
      • Bitcoin’s traffic is unencrypted, so this attack can be easily performed without establishing new connections. If the traffic is encrypted (as it is in Ethereum) the attack will need new connections and be more detectable.
    • Connect to fake peers
      • Now an attacker can infer a client’s peers by eavesdropping on its connections. If a client establishes connections to peers it does not really interact with, the attacker will be obfuscated by connecting to irrelevant nodes.
    • Also request for transactions that a node creates
      • The method used in this attack to reduce the anonymity set is to exclude transactions that a node requests.
    • Avoid requesting transactions from a node under the same AS or IXP
      • In this way, a single AS or IXP will not intercept too many of the victim’s connections.
    • Increase the transaction propagation delay for nodes with an AS-path that contains similar AS or IXP
    • Use Tor or VPN
      • Countermeasures for Tor:
        • Connections using the Tor network can be prevented by denial-of-service attacks or merely dropping traffic, since all Tor relays are publicly known.
        • The attacker can still use timing analysis as presented here, which leverages the correlation between packet timing and sizes to infer the network identities if the traffic from the server to the relay and from the relay to the client is known.
      • If using VPN, only the IP address of the VPN service can be revealed.

Method

  • They evaluate on a simulated Internet
    • They simulate the BGP routing protocol and calculate all possible AS-paths for each AS pair.
    • They fetched IP addresses of Bitcoin and Ethereum clients from:
    • They infer the AS for each IP address by searching all possible BGP routes.
    • They construct the AS-level topology with around 67K ASes, more than 700 IXPs, and around 4M links by the data from CAIDA.
    • To simulate the Bitcoin network, they use Bitcoin Core version 0.19.1 with Poisson delay for diffusion.
    • To simulate the Internet delay (the delay between the ASes of the Bitcoin nodes), they use the RIPE Atlas platform, which is composed of a global network of devices that actively perform Internet measurements.
    • They simulate 10000 transactions, where 100 of them are created by the victim node. 70% of the transactions are for training and 30% for testing. For feature selection, they use 5-fold cross-validation on the training set.
  • They evaluate on the actual Bitcoin network
    • They host a Bitcoin node with Bitcoin Core version 0.19.1 and attack their node.
    • They configure the node so it won’t listen for incoming connections, it will only connect to a predefined set of peers randomly selected from https://bitnodes.io/.
    • They capture around 30K transactions, among which are 10 transactions created by the victim node.
    • For the testing set, they use 30% of the transactions from other clients and all of the victim’s transactions.
    • For the training set, they use the remaining 70% of the transactions.
    • They use the same feature set selected in the simulation.

Results

  • In simulation, an adversary de-anonymizes the victim with 100% accuracy, because it is straightforward for anomaly detection where every client runs the same code.
  • In the real-world experiment, an adversary intercepting only 25% of the victim’s connections can deanonymize it with 70% accuracy, while intercepting 50% of the connections can deanonymize with 90% accuracy.

  • All Bitcoin clients were vulnerable to PERIMETER by their own provider, which observes > 90% of their connections.
  • For most Ethereum clients, there are 4 distinct network adversaries that can intercept 30% of their connections. This is the same for 50% of the Bitcoin clients.
  • 10 ASes together intercept 90% of the Ethereum clients and 85% of the Bitcoin clients.
  • If those 10 ASes and IXPs collude, they can deanonymize 85% of all transactions in Bitcoin and at least 90% of the Ethereum peer-to-peer graph.

Discussion & Key Takeaways

  • PERIMETER is proved to be able to map transactions to IP addresses on Bitcoin.
  • PERIMETER is a passive, network-layer attack that leverages the connections observed by malicious ASes and IXPs. It only listens to but doesn’t establish new connections, which makes it hard to detect and mitigate.
  • PERIMETER uses anomaly detection to infer which transactions are created from a node by looking at the transition propagation pattern between its peers.
  • PERIMETER can extend to Ethereum by establishing new connections to a node’s peers. The peers’ IP addresses can be read from the unencrypted header of the data packets of Ethereum transactions.
  • Previous attacks using supernodes can only achieve 75% accuracy, while PERIMETER can achieve 90% accuracy by intercepting 50% of the victim’s connections.
  • 10 malicious 10 ASes and IXPs together can deanonymize 85% of all transactions in Bitcoin and at least 90% of the Ethereum peer-to-peer graph.
  • Mixing services cannot prevent PERIMETER from inferring a transaction’s IP address.
  • No existing countermeasures can prevent PERIMETER. The authors suggest countermeasures by modifying the design of cryptocurrency clients.

Implications & Follow-ups

  • Using VPN to send transactions is the only way against deanonymization attacks. Even using Tor is vulnerable.
  • The centralization of the Internet enables this type of attack. Most connections can be intercepted by a few ASes or IXPs.
  • Although the node for sending a transaction is not compromised, it can still be de-anonymized by its neighboring peers.
  • To prevent this attack, modifications to existing cryptocurrency clients are needed. The transaction propagation efficiency may be sacrificed.

Applicability

  • If the code is open-sourced, it may be interesting to apply the trained model to find the IP addresses of previous hacks.
  • Even if the IP address found is from a VPN service, one can still ask the VPN provider to reveal the hacker’s data for legal purposes.
  • Sophisticated attackers may send transactions from random locations. This attack may only be able to deanonymize benign transactions.
  • Developers of cryptocurrency clients can reference this work to provide upgrades with increased anonymity.
6 Likes

Some supplying discussion:
The authors did not address how Ethereum 2.0 would change the behavior of devp2p clients, and how the transportation layer encryption of peers/clients works.
This is the current behavior in ETH 1.0 clients.

In Ethereum 2.0, the client encryption and interaction are being replaced with a customized one(not devp2p anymore), and are under active development at the moment.
You can see the specs for phase 0 here:

It is still unknown that how will this affect the linkability of transactions and client IP in ETH 2.0, the issue PERIMETER paper addressed, as the way peer communicates to each other in ETH 2.0 is vastly different than in ETH 1.0.

Speculations and discussions are welcomed as I have limited capabilities on infra-related network security.

7 Likes

Good point @Jerry_Ho – moving away from Kademlia DHT will entail some privacy improvements. The current design of discv5 limits data stored from other nodes and, unlike Kademlia, it does not store arbitrary metadata and key maps.

It’s important to note, though, that attackers could still build custom loggers that circumvent some of these protections, as it was the case when Grin was attacked.

In order for meaningful privacy to be achieved, there needs to be stronger protections for message routing. Solutions like Dandelion (BIP 0156) are promising since they use onion routing to relay messages more privately between nodes. Like Tor, there are edge cases where privacy can be broken, but these solutions still provide a considerable improvement relative to what is currently in place.

6 Likes