Research Summary: A large-scale empirical study of low-level function use in Ethereum smart contracts and automated replacement

RuiXi · January 6, 2023, 10:15am

TLDR

The Solidity programming language provides features to exercise fine-grained control over smart contracts, whose usage is discouraged by later-released Solidity documentation but nonetheless supported in later versions for backward compatibility.

In this paper, we define the term “low-level functions” and study the usage of the low-level function in a 2 million real-world smart contract dataset. We find that low-level functions are widely used and that most of these uses are gratuitous for the contract’s functionality.

We proposed a fully-automated, source-to-source transformation tool, GoHigh, to detect low-level functions corresponding to their Abstract Syntax Tree (AST) patterns, and to replace them with high-level alternatives.

GoHigh’s replacement maintains the behavior consistency of the contracts. By replaying Ethereum transactions and comparing the external state changes, among the contract that can be verified (~80%), we verify that all of the state changes matched after replacement. The remaining contracts are not verifiable due to their external dependency on other contracts.

Citation

Xi, R, Pattabiraman, K. A large-scale empirical study of low-level function use in Ethereum smart contracts and automated replacement. Softw Pract Exper. 2022; 1- 34. doi:10.1002/spe.3163

Core Research Question

How are low-level functions used in real-world Solidity smart contracts, and can GoHigh replace them with their high-level alternatives automatically?

Background

Low-level function: Low-level function is a subset of Solidity built-in constructs that has specific issues. We select constructs that have known hacks but whose issues are not addressed. We also exclude those constructs that require business logic-specific knowledge to determine whether they are potentially insecure. For example, the block timestamp construct can either be used as a bad source of randomness or a good (but not-so-precise) timer. This would be outside the scope of our work as it is application-logic specific.
Abstract Syntax Tree (AST): An AST is a tree representation of the abstract syntactic structure of the source code. Each node of the tree denotes an element of the source code. In Solidity, the AST of a contract can be generated from its compiler, solc.
The state change of a contract: A smart contract is a finite-state machine that changes from one state to another in response to its input (a.k.a. transaction, in the Ethereum blockchain context). The state of a contract is defined by the contract variables and its balance. We consider two contracts as identical to each other if and only if (1) they have the same state definition and (2) their state changes to the same input are the same.

Summary

Guidelines from the Solidity official documentation provide a list of do’s and don’ts in the form of “warning boxes”. There are a total of 28 guidelines listed in the Language Description section of Solidity version 0.8.6.
Our analysis of 149k real-world smart contracts published before the guidelines were released (the base dataset) reveals that more than 13% of the contracts contain at least one low-level function. However, 82% of these low-level function uses are gratuitous, and hence can be replaced by high-level alternatives.
Further analysis of 2 Million real-world smart contracts published after the guidelines were released (the latest dataset) shows an increasing trend in the use of low-level functions. Overall, we find that 40% of the contracts use low-level functions, and that 95% of the uses are gratuitous in the latest dataset. Thus, low-level functions are actually increasing in usage despite the publication of the guidelines.
Even though the replacement of low-level functions might be easy for experienced developers, it is non-trivial for many developers. The main challenge in automated replacement is that one replacement does not work on all patterns of use of low-level functions. Developers tend to use various home-grown check patterns to prevent the vulnerabilities of low-level functions, which complicates their replacement.

Method

We first distill source code patterns of low-level functions from our dataset iteratively using regular expression and then condense the source code patterns into 11 AST representations. GoHigh uses the AST patterns to automatically identify the low-level functions at the AST level.
In GoHigh, each AST pattern has a custom replacement. For example, though the if-clause pattern and the if-not pattern both protect the statements located in the if block, they require different replacement patterns as their behaviors differ.
After the replacement, GoHigh decompiles the AST representation of the contract back to its source code representation.

Results

To evaluate the effectiveness and efficiency of GoHigh, we evaluate the coverage, state change differences, and gas cost overhead of GoHigh’s replacement.
The coverage of GoHigh is given by the percentage of contracts captured by the regular expressions generated from the first step. GoHigh has an overall coverage of 100% in identifying the patterns of low-level functions in both the base and the latest dataset.
To compare state changes, we first extract the public variables of each contract along with its balance to determine the state of the contract. Then, we deploy both the original and replaced contracts on a private Ethereum blockchain node, after removing the original contracts that fail to deploy in our node. Finally, we replay the transactions in the transaction logs, and compare the external states of the contracts with each other. We say a replacement has “succeeded” if the states match each other after the replay, as this suggests that no unintended side-effect was introduced by GoHigh. The success rate of GoHigh is 100% for the verifiable contracts (these constitute 80% of the dataset). The remaining 20% are not verifiable due to the difficulty of replaying transactions to contracts with external dependencies.
We use a runtime method to estimate the gas used by the contract, which is based on the gas used in its historical transactions. We find that GoHigh is able to reduce gas consumption by 5.32% across all the datasets.

Discussion and Key Takeaways

Counter-intuitive
- Despite the publication of Solidity guidelines, the use of low-level functions, which are discouraged by official Solidity documentation, is actually increasing as a percentage of total contracts. Meanwhile, the number of contracts themselves is exponentially increasing as well.
- Low-level functions do not necessarily save gas - they do so only when developers are very careful with memory allocation and reuse in storage-heavy tasks, which is not the case most of the time.
- Only a handful of basic smart contracts are reused frequently. For example, in the empirical study, we find that Forwarder, ForwarderERC20, and Proxy are the most common contracts in our datasets. The two forwarder contracts are temporary keepers of Ether and any ERC20 tokens, which can be forwarded to their actual owners later on. The proxy contract is a gateway to its implementation - developers usually use this proxy-implementation pattern to upgrade the implementation contract without changing the address of the proxy.
Lessons learned
- It is difficult to replay an existing transaction for contracts with external dependencies in a private Ethereum node. It means that you will either need to maintain all the snapshots of the Ethereum when each transaction happened or trace back and resolve all the contract dependencies and reconstruct their states.
- There are many corner cases in replacing low-level functions in smart contracts that need to be taken into account, as programmers do not follow the same standards.
- The guidelines are sometimes misleading. Even though the official Solidity documentation suggests using transfer() to perform the native token transfer, the community usually recommends using call() as it increases interoperability across smart contracts.

Implications and Follow-Ups

Implications
- This work unveils a counter-intuitive observation that despite the publication of Solidity guidelines, the use of low-level functions, which are discouraged by official Solidity documentation, is actually increasing in real-world contracts. This not only advocates Solidity developers to double-check the guidelines before coding, but also implies that the guidelines may be gradually outdated as time goes by.
- Our evaluation results in the gas suggest that using low-level functions does not save gas in practice, and often ends up using more gas
- The difficulty we faced in replaying transactions exposes the lack of a scalable and efficient runtime testing framework for Solidity smart contracts. Current testing tools, mostly fuzzing tools, are not capable of emulating the complex external environment of the contracts.
Follow-ups
- We should consider the potential effects of Inline Assembly (IA) more carefully. IA plays an important role in Solidity smart contracts, especially in frequently used library contracts.
- Similar to replacing low-level functions, migrating an existing smart contract to an upgradeable contract without breaking its functionality requires non-trivial effort. For example, the upgradeable pattern needs to be chosen with care, the storage layout of the proxy contract must be maintained during migration, and the implementation contract must be initialized during its construction.
- It would be interesting to study how well access controls are configured in existing smart contacts, and how many existing vulnerabilities are protected by access controls.

Applicability

GoHigh, the automated source-to-source transformation tool for low-level functions, can be easily used by Solidity developers who maintain smart contracts with low-level functions-related issues. Note that GoHigh currently works only for contracts whose source code is available. GoHigh currently supports Solidity version >= 0.3.6 and < 0.9.0.

Tadashi · January 10, 2023, 3:39am

Thanks for summarizing this interesting paper. Here is the link for the code associated with the paper: GitHub - DependableSystemsLab/GoHigh: GoHigh for SANER’22 paper: https://blogs.ubc.ca/dependablesystemslab/2021/12/18/when-they-go-low-automated-replacement-of-low-level-functions-in-ethereum-smart-contracts/

Topic		Replies	Views
Research Summary: Code Cloning in Smart Contracts on the Ethereum Platform: An Extended Replication Study Tooling and Languages summary , network-security	11	1075	December 29, 2022
Research Pulse #59 04/04/22 Research Pulse	0	810	April 4, 2022
Research Pulse Issue #28 08/30/21 Research Pulse	1	571	August 30, 2021
Research Summary: MANDO-GURU: Vulnerability Detection for Smart Contract Source Code by Heterogeneous Graph Embeddings Auditing and Security summary	15	1433	December 15, 2022
Research Summary: Declarative Smart Contracts Tooling and Languages summary	5	964	November 29, 2022