Research Summary: ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning

TLDR

  • The Ethereum Smart COntRacTs Vulnerability Detection (ESCORT) tool detects multiple vulnerability types and can be quickly updated to defend against new vulnerabilities.
  • Their core innovation comes from a divide-and-conquer approach which breaks the task of detecting vulnerabilities into learning general features and identifying vulnerabilities.
  • It achieves a 95% detection accuracy F1 score on average and is able to provide parallel detection of 8 vulnerabilities within 0.02 seconds.

Core Research Question

How can deep learning models detect multiple vulnerabilities while being updated to detect new threats as well?

Citation

Oliver Lutz, Huili Chen, Hossein Fereidooni, Christoph Sendner, Alexandra
Dmitrienko, Ahmad Reza Sadeghi, and Farinaz Koushanfar. 2021. ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning. arXiv preprint arXiv:2103.12607 (2021). [2103.12607] ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning

Background

  • Smart Contracts: Computer programs that execute agreements. They are written in high-level languages such as Solidity, compiled to bytecode, and executed inside the Ethereum Virtual Machine (EVM).
  • Bytecode Representation: The bytecode of a smart contract executed in the EVM. Blockchain operation has a one-to-one mapping relation with their bytecode representations, which makes it possible to analyze the flow of a contract at bytecode level.
  • Callstack Depth: A class of vulnerability for ESCORT to detect. The attacker uses EVM’s depth limit of 1024 to cause an error when a function is called.
  • Reentrancy: A class of vulnerability for ESCORT to detect. An attacker calls the contract’s function recursively, draining the Ether in the contract.
  • Multiple Sends: A class of vulnerability for ESCORT to detect. Denial of service (DoS) occurs when a transaction reaches its spendable gas limitation.
  • DoS (Unbounded Operation): A class of vulnerability for ESCORT to detect. A DoS attack that triggers execution cost limits of a smart contract by an external caller.
  • Accessible selfdestruct: A class of vulnerability for ESCORT to detect. A programming error that leads to the termination of a contract, sending the remaining funds to a predefined address.
  • Tainted selfdestruct: A class of vulnerability for ESCORT to detect. An extension of accessible selfdestruct, but the attacker can set the address to send the remaining balances to.
  • Money concurrency: A class of vulnerability for ESCORT to detect. Also known as Transaction Ordering Dependence (TOD), this vulnerability is caused by the miner’s ability to decide what transactions to execute, thus changing the order of transactions, opening the door to potential attacks.
  • Assert violation: A class of vulnerability for ESCORT to detect. A programming error that leads to a constant error state in the smart contract.
  • Multi-Output-Layer Deep Neural Network (MOL DNN): A deep neural network that can produce multiple variables for one prediction.
  • Feature Extraction: The process of extracting feature representation from data to serve purposes like abnormality detection. This usually removes the need for a larger or more evenly distributed dataset.
  • Transfer Learning: To train on one model and then apply that progress to another use case without starting from scratch.
  • Precision and Recall: Precision is defined by \frac{\text{ True Positives}}{\text{True Positives + False Positives}}, while recall is defined by \frac{\text{True Positives}}{\text{True Positives+False Negatives}}.
  • F1 Score: Defined by the harmonic mean of precision and recall \frac{2\times Recall\times Precision}{Recall + Precision}.

Summary

  • Consider a malicious party that obtains its knowledge from the public data structure of a blockchain, and can freely upload their contracts to the Ethereum system. They will attack in one or more of eight vulnerability classes described in the background section.

  • The ESCORT would help the “defender”, that is, an Ethereum designer or end user, to ensure their program is not exploitable by malicious adversaries during code development time or before sending transactions.

  • In addition, ESCORT provides a single model that can identify novel vulnerabilities. There are thus many challenges to building ESCORT.

  • Collecting a large enough dataset is difficult because not enough smart contracts are open-sourced. Although bytecodes are publicly available, they are too long to process under a reasonable memory size.

  • Acquiring desired sample sizes of each vulnerability is challenging because only a small portion of smart contracts fall into that class, and unfortunately deep neural network training tends to bias towards the majority class.

  • Extracting feature representations is challenging because traditional software testing tools are not domain-specific enough, yet finding them manually is unrealistic.

  • Identifying multiple vulnerabilities with a single model is difficult because different vulnerabilities exploit distinct loopholes in a contract.

  • Empowering the model to detect unknown vulnerabilities while preserving the knowledge of existing ones is something that is not fully considered in previous works, yet very important to avoid the high cost of training a new model from scratch or fine-tuning a pre-trained model on a large dataset.

  • To address these problems, the authors built a toolchain called ContractScraper for sourcing and processing data.

  • An innovative divide-and-conquer approach was proposed. Instead of directly identifying vulnerabilities, they split the task into learning the semantic and syntactic information of smart contracts, and predicting the existence of different types of vulnerabilities.

  • For the experiment, ESCORT was first trained to detect six vulnerabilities, and then assigned to learn the remaining two.

Method

  • The authors chose to work with bytecode-level data, and deal with memory issues by sizing down the input data length.

  • They built ContractScraper, a toolchain to obtain bytecode files of smart contracts from the Ethereum platform, label them, and store the results in a database.

  • Bytecode acquisition starts with downloading around 1.2 million smart contracts from the first 5 million Ethereum blockchain blocks.

  • Raw data consists of hexadecimal digits that represent particular operation sequences and parameters. The data is then cleaned and processed to reduce input size and overcome memory constraints.

  • To label the data, they use inbuilt vulnerability detection tools in ContractScraper. Each of the vulnerability detection tools used by ContractScraper is specialized for detecting a specific set of vulnerability types.

  • 15,000 samples were selected for each vulnerability class, and one class that had no vulnerabilities. Note that the actual size of total data is less than 15000 x (8+1) because one smart contract can contain more than one type of vulnerability.

  • A DNN was trained to learn the bytecode features of general contracts. The defender also specifies system parameters including vulnerabilities.

  • Inside the DNN, a feature extractor was created to learn the semantic and syntactic information from the contract’s bytecode. It is composed of a stack of layers.

  • These are meant to solve accuracy problems resulting from working with long, hexadecimal bytecodes, and process input data via linear mapping, this converts the data into fractional numbers before learning them, improving efficiency.

  • The feature extractor is then extended to multiple vulnerability branches. Each branch is a stack of layers designed to learn a specific type of vulnerability.

  • Each branch outputs a probability for the specific type of vulnerability they aim to detect.

  • When a new vulnerability is identified, the defender constructs a new dataset, and adds a new vulnerability branch to the existing model.

  • Existing branches are left intact, ensuring that old knowledge is preserved.

Results

  • Classification of the first six vulnerabilities. (p.12, Table 3)


    During the first part of training, ESCORT on average achieved a higher than 95% F1 score on both the training and validation set.

  • Classification results after two new vulnerabilities were added. (p.12, Table 4)


    After transfer learning occurred, the new classes achieved 92% and 93% F1 score. Their training time was reduced by around 43%.

Discussion and Key Takeaways

  • Classification: ESCORT achieved exceptional results across different classes. The innovative divide and conquer approach could be the reason why.
  • Transfer Learning: There was no significant drop in performance after two new classes were added. This proves ESCORT’s ability to learn new vulnerabilities without unlearning previous knowledge.
  • Training Time: Evidence of a training time decrease fits into the narrative that new branches are add-ons, proving their success.

Implications and Follow-ups

  • This novel approach is different from all previous works. Instead of packaging learning and detecting vulnerabilities in a single step, ESCORT proves that dividing the task into two could make it a more effective tool.
  • The problem with learning new vulnerabilities while preserving old ones is resolved by having independent branches to address separate classes.
  • As a result, model drift, the bias of a model towards particular classes, is easily solvable by updating the parameters of a vulnerability branch.
  • Previous works predominantly didn’t distinguish between vulnerability types and just reported a security score. This made it harder to justify the F1 score, and practical implementation was harder since different vulnerabilities needed different solutions.
  • Other works didn’t do a great job of representing smart contracts in a meaningful way. Previous works that used RGB color images of codes or nodes and edges of a graph representation didn’t achieve comparable results to ESCORT.
  • Another thing ESCORT took great care of was that the input data was more accessible. Previous works used source code, which, as discussed, is hard to acquire.
  • The ability to process long contracts is also a breakthrough, further enhancing generalizability and applicability.

Applicability

  • For developers, ESCORT has the potential to improve security by detecting risks within their code, allowing vulnerabilities to be caught before transactions are defined.
  • For smart contract platforms such as Ethereum, ESCORT can provide a safe environment that runs a check on applications before they are broadcasted and executed in adversarial environments.
  • ContractScraper is a useful tool for people to scrape contracts with and train existing models on new vulnerabilities.
  • Although the vulnerabilities presented for ESCORT to train on may not be the best representation of the most important vulnerabilities in the wild, this framework would be useful for future works in smart contract vulnerability detection with machine learning.
5 Likes

Thanks for summarizing this innovative approach to vulnerability detection. Several months ago, @lnrdpss authored a mini-post on the Art of Auditing discussing the balance of automated tools as well as manual review. I’m interested in where ESCORT might fit into the auditing landscape.

I’m also interested in the predictive part of this innovation. The summary states:

An innovative divide-and-conquer approach was proposed. Instead of directly identifying vulnerabilities, they split the task into learning the semantic and syntactic information of smart contracts , and predicting the existence of different types of vulnerabilities .

How is the prediction being accomplished in this approach?

3 Likes

Interesting read. I would like some clarification on the decision logic the authors made when conducting this study. As far as the divide and conquer approach to learning the semantic and syntactic information of smart contracts, why is there a need to learn general “syntactic” features when there should already be existing tools/frameworks that can be used to label these “syntactic” features? Static analyzers of SLITHER, SECURIFY, and SOLHINT may be a good start to label these general syntactic features instead of undergoing the arduous task of the DNN model to learn the smart contract features.

What prompted the authors to have to further learn smart contract language structures on the bytecode level? I would imagine that since the raw data is in hexadecimal bytecode, the underlying data of 15000 smart contracts for each vulnerability label would be a lot! Why did the authors specifically chose DNNs models to parameterize the issue of detecting vulnerabilities? Correct me if I’m wrong, the specific transfer learning aspect of ESCORT happens when the single existing DNN model tries to parameterize itself for the new vulnerabilities as new labels and predict upon those incumbent labels?

I carry the opinion that the authors simply wanted to throw a DNN at everything and apply standard “AI” techniques to a problem that hasn’t been applied with “AI” yet.

Hi @zube.paul thanks for showing interest in the research. Prediction is not so different from how existing machine learning-based vulnerability detection methods work, because the key innovation comes from the divide-and-conquer approach, which is exactly what you have quoted :) Hope that clears things up.

Hmm. I am always very skeptical of fully automated tools for smart contract vulnerability detection. Often times they display a high rate of false positives, whose filtering is extremely time consuming from an auditing perspective.

With that disbelief spirit in mind, I have a few questions about this paper:

How can they indeed know there are no false-positives in their labeling process, specifically as done by ContractScraper? They need to have a means to measure it; otherwise, their whole experiment results, as shown in Table 3, cannot state any true confidence.

Did they open-source this tool as a means to foster reproducibility, as well as for just trying it out?

4 Likes

Hi @Tony_Siu, the semantic and syntactic features of smart contracts are important for transfer learning to take place, which enables the model to deal with new vulnerabilities quickly. Training with byte code level data is realistic because it’s publicly available and thus much easier to construct.

2 Likes

Hi @lnrdpss. Thanks for leaving a question, and providing context to the challenges in auditing.

To your question, the authors believed that performance data is available in previous publications, and had been inspected by experts to ensure correctness. If we can review those references, they would help answer how confident we can be with the performance.

What we do know, for now, is the tools that authors used to label the data : Contract Library, Mythril, and Oyente.

2 Likes

I think my proposition is that there should already be “static” and “dynamic” analyzers that leverage program analysis techniques that identify vulnerabilities from the semantic and syntactic information of the smart contract source code. There shouldn’t be a further need to

learning the semantic and syntactic information of smart contracts

My other proposition is that there should be more than enough data considering it is 15000 smart contracts worth of bytecode. Each single smart contract byte code being an N-dimensional feature vector itself, why not employ an autoencoder to encrypt the bytecode into a feature vector more relevant to training ESCORT?

1 Like