Research Summary - A Source-Code-Based Taxonomy for Ethereum Smart Contracts

tomideadeoye · March 22, 2022, 12:30am

TLDR:

Standard categorization of smart contract features could make development more accessible and limit vulnerabilities, particularly through the development of general-purpose libraries and interfaces.

The authors examined the source code of 150 smart contracts in 101 DApps. They found 64 characteristics (across 28 dimensions) in six meta-categories for classifying smart contracts.

These characteristics led them to further identify 7 prominent clusters of smart contracts for future research and standardization of libraries.

Core Research Question

Which common code patterns are used to develop smart contracts, and what archetypes can be distinguished from these patterns?

Citation

Hofmann, Adrian & Kolb, Julian & Becker, Luc & Winkelmann, Axel. (2021). A Source-Code-Based Taxonomy for Ethereum Smart Contracts. https://www.researchgate.net/publication/354901921_A_Source-Code-Based_Taxonomy_for_Ethereum_Smart_Contract

Background

Anti-Early Whale protocol: Mechanisms to prevent users from acquiring excess tokens during the early stages of a smart contract.
Archetype: A typical example or model of a group of smart contracts that follow similar patterns and share common features.
Asset Handling: Mechanisms that help manage, transfer or prove ownership of various tangible or intangible objects such as digital objects in games, real estate, or share certificates.
Core Logic: Functions crucial to a smart contract’s primary use, such as game rules in gambling smart contracts.
Default Function: A fallback function executed when a function identifier does not match the available functions in a smart contract or if Ether is supplied without data to a smart contract.
Check Address Function: Checks the validity of wallet addresses and contracts before executing transactions to prevent token loss.
Helpers: Libraries with additional features for augmenting the core functionality of applications. Functions include length checking, concatenation, comparison, and slicing data types or external data access.
OffChain: This refers to handling transactions or data outside of a blockchain.
Oracles: Third parties used in accessing data from sources outside of a blockchain.
Safemath Function: Functionalities used for monitoring overflows or underflows in calculations.
Taxonomy: The science of naming, describing and classifying things; in this case, smart contracts.
Tokens: Units of local values that help create economic incentives on blockchains. Categories of tokens include cryptocurrencies, network tokens, and investment tokens. Categories of token usage include blockchain-native tokens, nonnative tokens, and DApp tokens.

Summary

Smart contract developers need standard interfaces and libraries. These libraries could prevent vulnerabilities and help develop best practices.
The paper studied the source codes from 101 decentralized apps and 150 smart contracts through six categories: DApp Design, Core Functionality, Helpers, Contract Management, Safely Functions and Tokens.
The authors found 64 common characteristics from 28 dimensions among the 150 smart contracts. These characteristics then led them to group the smart contracts into 7 archetypes.
The final taxonomy comprising 64 characteristics from 28 dimensions and 7 clusters sheds light on the combined characteristics of distinct smart contracts types.
The paper is a step towards standardizing functionalities and libraries in smart contracts.

Method

The authors used both qualitative and quantitative methodologies. They selected 101 dApps from databases like www.dapp.com and www.stateofthedapps.com based on their level of activity, ease of access to their codebases, representativeness, and feasibility for analysis. Meta-characteristics for identification included smart contract classes, functions, and code patterns.
Using an inductive taxonomy development approach, they analyzed 120,000+ lines of the DApp source codes, extracting and analyzing patterns using algorithmic methods. Only active Ethereum-based dApps with publicly available source codes were analyzed. The authors used Ethereum because of its popularity and several use cases.
The authors also used previous classifications made in their taxonomy of gambling smart contracts to identify new categories.
The author settled on 6 categories and used those categories to generate 64 characteristics across 28 dimensions. Finally, the authors used a hierarchical clustering algorithm to group the smart contracts into seven clusters from the defined characteristics.

Results

The authors then generated 64 characteristics from the six categories itemized below.

The DApp Design category established two characteristics: DApps combining all functionalities in a single smart contract and DApps splitting functionalities among multiple contracts.
The Core Functionality category characterized contracts based on usage fees. Fees in the contracts were either changeable by the contract owner, fixed or nonexistent. Other characteristics generated were based on the transferability of assets and the inclusion of the contract’s core logic in its codebase.
The Helpers category characterized contracts based on the use of Interfaces, Oracles and Helpers such as Math, String, and Byte helpers.
The Contract Management category characterized contracts based on the malleability of contract roles, ownership handling, rebranding, updating, and killing smart contracts.
The Safety Functions category characterized contacts based on the provision of Default Functions, Safemath Functions, Pause Contract Functions, Refund Users functions, Anti Early Whale protocols, Withdraw Pending Transactions Functions and Reentrancy Guards.
The Tokens category characterized smart contracts based on usage of tokens and the type of token used. Other characteristics generated were based on the availability of functions for creating new Tokens (Mintable Tokens), destroying existing Tokens (Burnable Tokens) and trading tokens (Trade and Accounting functionality).

From the clustering algorithms used, the authors identified the following seven clusters, often containing between 10 to 15 smart contracts:

Archetype 0: High-Value Asset Management comprises Contracts with more ownership and role management features. Contracts in this archetype often include “withdraw pending transactions’’ features since they are primarily used for collectables.
Archetype 1: Tokenized Asset Contracts without user management comprises Contracts with tokens that users cannot buy, sell, deposit, or withdraw. Most of them have check address, mint or burn features. They are generally used to track game assets or register (domain) names.
Archetype 2: Technically Secure Implementations of Financial Applications consists of contracts enabling Initial Coin Offering (ICO), token sales and token transfer. Contracts here are often Mintable, Burnable ERC20 Tokens.
Archetype 3: Asset Centered Contracts with User Management is similar to Archetype 1 but provides extensive functionalities for handling user roles and ownership. The authors suggest that the standardization of functionalities in the handling of user roles and ownership will result in the merging of Archetype 1 and Archetype 3
Archetype 4: “Bets” on OffChain Events comprises contracts interacting with oracles, mostly in gambling or exchange related use-cases. Most utilize libraries for interaction with Oracles.
Archetype 5: Simple Contracts and Miscellaneous comprises uncategorized contracts with limited helper functions, role management, and standardized functions.
Archetype 6: Simple Contracts and Miscellaneous with Asset Based Governance is similar to Archetype 5, but it provides additional asset management features and defines roles for token owners. It is often used in Blockchain solutions that boost transaction rates or enable interoperability.

Discussion and Key Takeaways

Clustering the 64 smart contract characteristics helps highlight combined attributes among distinct types of smart contract. The archetypes developed by the research detail different groups of functional characteristics that exist in most Ethereum smart contracts.
Nevertheless, the authors maintain that their taxonomy is incomplete since taxonomies are never complete and should be expandable as new objects emerge.
The authors argue that this source-code based taxonomy will improve the understanding of smart contract technical and functional characteristics by developers and researchers.
Providing a comprehensive smart contract taxonomy will also assist other researchers in creating higher level taxonomies for developing libraries and standard interfaces or for developing more comprehensive methods of classification.

Implications and Follow-ups

Documenting the technical characteristics of smart contracts within usage categories will help researchers better classify and analyze them. New blockchain applications can now be more easily conceptualized on a platform level using higher-level taxonomies.
The paper also shows that applications with distinct functions could have strong technical similarities. Developers could use the results to consider architectural designs and standardize functionalities.
Based on Archetype 4, the authors suggest that developers of oracle standards should ensure that they are compatible with string libraries and native data types since the combination of oracles with string helpers was observed in some archetypes.
Based on the contracts observed in Archetype 0, the authors argue that the functionality of withdrawing pending transactions should be implemented more often with non-fungible tokens such as an ERC721 since they often represent ownership of valuable assets.

Applicability

Identifying common smart contract characteristics and archetypes identifies areas suitable for developing standards and libraries and areas that may not require immediate standardisation.
It assists developers in specific domains to identify features critical to their smart contracts, and facilitates the description of their technical specifications.

tomideadeoye · March 22, 2022, 12:36am

I have provided two visualizations below.

The first is from the authors; It helps visualize the level of adoption of the features and characteristics identified. It shows the 64 characteristics, the 28 dimensions from which the characteristics were derived and the 7 clusters formed from the characteristics.

tomideadeoye · March 22, 2022, 12:57am

I have made the second visualization here to provide a general overview of the research; the objectives, findings and other details.

Topic		Replies	Views
Research Summary: A large-scale empirical study of low-level function use in Ethereum smart contracts and automated replacement Tooling and Languages summary	1	483	January 10, 2023
Research Pulse Issue #20 07/06/21 Research Pulse	3	936	July 22, 2021
Research Summary: Code Cloning in Smart Contracts on the Ethereum Platform: An Extended Replication Study Tooling and Languages summary , network-security	11	1074	December 29, 2022
Research Pulse Issue #28 08/30/21 Research Pulse	1	570	August 30, 2021
Research Summary: Attacks on Smart Contracts Auditing and Security summary , network-security	28	4308	January 2, 2023