Research Summary: Understand Ethereum via Graph Analysis

TLDR:

  • The first systematic study of Ethereum using graph analysis. It characterizes three major activities: money transfer, smart contract creation, and smart contract invocation. Security issues based on the results are also addressed.

Citation

  • Chen, Ting, et al. “Understanding ethereum via graph analysis.” IEEE INFOCOM 2018-IEEE conference on computer communications. IEEE, 2018.

Link

Core Research Question

  • Can we use graph analysis to understand the characteristics of users, Ethereum smart contracts, and the relationships between them?

Background

  • Ethereum is the largest blockchain that can run smart contracts.
  • Smart contracts are autonomous computer programs that, once started, execute automatically and mandatorily according to the program logic defined beforehand.
  • Transactions are signed data packages containing messages with useful information.
  • An External Owned Account (EOA) is controlled by anyone with the private key, not code.
  • Smart contract account: if the receiver field in a transaction is null, the return address is a smart contract address. Otherwise it’s an EOA.
  • An External transaction is sent from an EOA. An external transaction may lead to many internal transactions.
  • An Internal transaction lists its sender as the contract address and usually results from executing a smart contract… It is non-trivial to collect all transactions, because while external transactions are publicly available on the blockchain, internal transactions are not.
  • Ethereum Virtual Machine (EVM) is the runtime environment for smart contracts in Ethereum, responsible for executing contract bytecode. Smart contracts are usually written in higher level languages and then compiled to bytecode for EVM to execute.
  • An Ethereum client downloads all blocks from other peers and constructs the blockchain in the local machine by replaying all historical transactions in the Ethereum Virtual Machine (EVM).
  • Money Transfer Graph (MFG): a weighted directed graph with accounts as nodes, the transfer direction as its edge, and the value transferred as its weight.
  • Contract Creation Graph (CCG): a forest containing multiple trees with contract accounts as nodes and the creation direction as the edge (unidirectional). Root of each tree is an EOA, and the other nodes of the tree are smart contracts directly or indirectly created by the root. The weight of each edge is 1 since it cannot be created twice.
  • Contract Invocation Graph (CIG): is a weighted directed graph where the number of invocations is the weight.

Summary

  • The authors introduce the motivation for this paper: more than 8 million smart contracts have been deployed on the Ethereum blockchain, but few studies have looked at their role in the ecosystem.

  • They collected public transaction data and performed graph analysis to find underlying insights (i.e. the relationship between contracts and users) in the Ethereum ecosystem.

  • They study three security issues using the graph: (1) accounts controlled by an attacker, (2) abnormal accounts that create lots of unused contracts*, and (3) deanonymization of accounts.

    • * Creating contracts and destructing them afterwards can be an arbitrage action. In Ethereum, users can get gas rebates for the destruction of smart contracts. This is especially useful when gas prices are high, and profitable if the contract was created at a lower gas price. Thus, accounts creating unused smart contracts may not be abnormal. The authors were not aware of this in this paper.
  • Finally they discuss why a graph analysis of Bitcoin cannot be applied to Ethereum

    • Bitcoin uses a multi-input output (UTXO) model while Ethereum uses an account-based model (one input one output).
    • In the UTXO model, the amount sent from one wallet to another may not match the actual amount. If the sent amount is larger than specified, change will be sent back to the original account.
    • Bitcoin users usually have many addresses (created by their wallet application to receive change from UTXO)
    • Therefore the construction of the graph is fundamentally different.

Method

  • The authors ran a local Ethereum client called Geth to collect all Ethereum transactions between July 30, 2015 to November 1, 2018.
  • They excluded three types of transactions: (1) failed, (2) worthless (0 Ether) transactions, (3) and self-destructing smart contracts.*
  • Then, they designed an Ethereum client to replay all external transactions to get a record of any internal transactions (there are only 6 EVM operations, so they could manually add recording codes to handle each one.)
  • Total data retrieved: 40M EOAs, 8M+ smart contracts, 330M external transactions and 330M internal transactions
  • Based on the data (internal + external transactions), they proposed and constructed three types of graph: money flow graphs (MFG), smart contract creation graphs (CCG), and smart contract invocation graphs (CIG).
  • The authors measured the graphs’ degree distribution, clusters, degree correlation, node importance, assortativity, strongly/weakly connected component (SCC/WCC), and the evolution of those graphs by investigating the evolving metrics over time (using a new snapshot each month).
  • They examined three security issues using the graph: (1) finding accounts controlled by an attacker, (2) abnormal accounts that created an unusual number of unused contracts, and (3) deanonymization of account identities.
  • They implemented an application that deanonymized accounts and an anomaly detector.
  • For attack forensics (looking for accounts controlled by an attacker), they found how many contracts an attacker created, and how many EOAs they leveraged in total, using weakly connected components.
  • For anomaly detection, they found accounts that created large numbers of smart contracts that were rarely used (this excluded exchanges, which also create large numbers of active contracts). The threshold for being an anomaly was manually set.
  • For deanonymization, they gathered all accounts in weakly connected components and performed NLP(Natural Language Processing) to extract and summarize information (this tool). Sample information included comments found in a smart contract that contained the development team’s names or email.

Results

  • The anomaly detector discovered 48 abnormal accounts.
  • The observations obtained:
    • Most users prefer transferring money rather than calling smart contracts;

    • Smart contracts for financial applications dominate the Ethereum ecosystem, but most smart contracts are not in use.

    • More users transfer Ether rather than deposit it (which may be because of speculators’ high frequency trading)

    • Few intensive interactions are found between smart contracts (possibly because smart contract applications are usually not complicated)

    • 57 percent of smart contracts don’t transfer money and haven’t been invoked. It is a waste of resources to store and sync them.

    • Users prefer to transfer money on Ethereum rather than using smart contracts.

Discussion & Key Takeaways

  • They provide a deeper understanding of interaction behaviors on Ethereum using graph analysis, specifically between smart contracts and how users interact with them.
  • They also highlight the differences between applying graph analysis to Bitcoin vs. Ethereum, and show that the graph needs to be fundamentally redefined.
  • Finally, they address three security problems using the graph analysis result: (1) find accounts controlled by an attacker, (2) discover abnormal accounts that create large numbers of unused contracts*, and (3) the deanonymization of accounts. The core concept is to trace weakly connected components (accounts that have interactions) on the constructed graph.

Implications & Follow-ups

  • They plan to include more graph metrics, use the results to develop more applications, and use machine learning to auto-find the threshold for anomaly detection of large-scale smart contract creation.

Applicability

  • A graph-based approach can be used to deanonymize accounts, identify malicious actions on Ethereum, and find accounts related to hackers. These are all important for a better KYC(Know-Your-Customer) & anti-money laundering system that fulfills the latest regulations from FATF(Financial Action Task Force).
  • This paper also highlighted the differences between the structure of the Bitcoin network and the Ethereum network. It could be useful when applying existing Bitcoin analysis techniques to Ethereum or other account based models.
  • Graph analysis can better visualize the Ethereum network and serve educational as well as academic research purposes.
  • Future application developers can see this as a market overview of how Ethereum users behave. These insights could indicate problems that need to be solved, and emerging issues such as the underuse of existing smart contracts wasting storage.
5 Likes

Wow, this is pretty intriguing material! The implications in the metrics are not good for decentralization if that was ever the goal.

1 Like

Thanks for your feedback! Do you mean deanonymization and finding the relationships of addresses are not good for decentralization? It is necessary for anti-money laundering though.