Research Summary: DO NOT RUG ON ME: ZERO-DIMENSIONAL SCAM DETECTION

TLDR

  • The reaserchers expand the dataset of Uniswap v2 scam tokens.
  • They provide a theoretical classification of three different types of rug pulls and provide tools to identify them.
  • The authors introduce two highly accurate and precise Machine learning-based models to discriminate between malicious and nonmalicious tokens in different scenarios before the malicious manuever.

Core Research Question

(Uniswap, like other DEXs, has gained much attention this last year because it is a non-custodial and
publicly verifiable exchange that allows users to trade digital assets without trusted third parties.
However, its simplicity and lack of regulation also makes it easy to execute initial coin offering scams
by listing non-valuable tokens. This method of performing scams is known as rug pull, a phenomenon
that already existed in traditional finance but has become more relevant in DeFi.)

Do rug pulls in Constant Function Marker Makers (CFMM) share similar features? Can we predict if a project is a rug pull before the malicious manuver?

Citation

Mazorra, Bruno, Victor Adan, and Vanesa Daza. “Do not rug on me: Zero-dimensional Scam Detection.” arXiv preprint arXiv:2201.07220 (2022). [2201.07220] Do not rug on me: Zero-dimensional Scam Detection

Mazorra, B., Adan, V., & Daza, V. (2022). Do Not Rug on Me: Leveraging Machine Learning Techniques for Automated Scam Detection. Mathematics, 10(6), 949. Mathematics | Free Full-Text | Do Not Rug on Me: Leveraging Machine Learning Techniques for Automated Scam Detection

Background

  • Decentralized Exchange (DEXs): A category of Decentralized Finance (DeFi)
    protocol that allow the non-custodial exchange of digital assets. All trades are executed
    on-chain and are, thus, publicly verifiable. The policy that matches buyers and sellers (or
    traders and liquidity providers) is hard-coded in a smart contract.
  • Rug pull: A malicious operation or set of operations in the cryptocurrency industry where the developers abandon the
    project and take the investors’ funds as profits.
  • Transaction graph: Weighted graph induced by token transactions.
  • Herfindahl-Hirschman Index: A measure of market concentration and is used to calculate market competitiveness
  • Cluster coefficient: A measure of network segregation that captures the
    connections of individual nodes and their neighbors.
  • Precision: Defined by \frac{\text{ True Positives}}{\text{True Positives + False Positives}}.
  • Recall: Defined by \frac{\text{True Positives}}{\text{True Positives+False Negatives}}.
  • Machine learning classifier: Is an algorithm that automatically categorizes data into one or more set of classes.
  • Cross validation: A resampling method that uses different portions of the data to test and train a model on different iterations. It provides information about how well does a machine learning algorithm or a model generalize.
  • Data agumentation: A technique that allows us to augment our training dataset to improve accuracy, generalisation, and control overfitting.

Summary

  1. Introduction.
  2. Related Work.
  3. Preliminars
    • Background.
  4. Malicious Uniswap Maneuvers:a malicious operation or set
    of operations in the cryptocurrency industry where the developers abandon the project and
    take the investors’ funds as profits.
    • Classification of different type of rug pulls.
      • Rug pulls
      • Pump-and-dump schemes
      • Money laundering
      • Others
  5. Data Collection:
    • Overview of the method used to extract all the necessary data.
  6. Token Labelling.
    • Provide the methodology to label tokens as scams or non-scams.
    • Overview of the results obtained by the labelling methodology proposed.
  7. Scam detection.
    • Define two methods (Activity based Method and 24 Early Method) that use Machine Learning models to discriminate between malicious and non-malicious tokens in different scenarios.
  8. Conclusions.
  9. Future Work

Method

  • Data collection: To obtain all the data needed to do the labelling and the analysis, we used an Infura archive node and the Etherscan
    API. To obtain the state of the Uniswap exchange and the tokens, we used the events produced by their respective
    smart contracts. To obtain the token transactions creation and the source code, we used Etherscan API.
  • Labelling:
    • First, we defined the maximum drop and the recovery of token prices and liquidity time series. The maximum drop measures fall in the price or liquidity of the Uniswap listed pools. The recovery represents the largest pump from the bottom. Also, if more than one month has passed between the last movement or transaction of the token so far, we consider that the token is inactive. This made a total of 27,588 tokens that could be tagged as malicious since they were inactive tokens, that had, at some point, lost all their value in price or liquidity and had not recovered it again.
      drawdown
    • Non-malicious tokens cannot be chosen from a liquidity, price, and activity analysis. Given a token, it may be considered malicious if there has been at least one rug pull at some point in its activity. However, a token that has not had any rug pull cannot be considered non-malicious, since it could experience a rug pull later on. Therefore, we take advantage of audits carried out by external companies (Certik, Quantstamp, Hacken…). Thus, a list of 674 tokens labelled as non-malicious have been mined from different sources: coinmarketcap, coingecko, etherscan.

  • Features: We compute the following features to extract relevant information about the tokens listed in Uniswap V2.

  • Machine Learning: We defined two methods that use Machine Learning models to discriminate between malicious and non-malicious tokens: Activity based Method and 24 Early Method.
    • Activity based Method: For each token labelled as malicious, we have randomly selected several evaluation points before the maximum drop. Non-malicious tokens have been evaluated throughout their activity. Then, for each evaluation point, we calculated the token features up to that block and used them to train two ML classifiers (XGBoost and FT-Transformer) to find patterns related to malicious activity.

  • 24 Early Method: For each labelled token, we have computed its features in each of the 24 hours after its pool creation. In this case, we are training the models for each hour, therefore, we only have one evaluation point for each token. This also implies that the dataset is smaller compared to the other method.

Results

Most tokens are labelled as malicious. Indeed, it would be enough to label all of them as malicious to achieve an accuracy of 97,7%. Therefore, we used a data augmentation technique that consists of choosing more evaluation points for non-malicious tokens than for malicious tokens. In particular, we selected five evaluation points for non-malicious tokens and one for the malicious. In addition, we labelled the non-malicious tokens as 1 and the malicious tokens as 0 and tried to increase the performance in predicting non-malicious tokens. To validate both methods we used 5-fold cross-validation, therefore all the results will be presented as the mean and standard deviation of all folds.

Activity based Method Results

  • Both XGBoost and FT-Transformer get high metrics for accuracy, recall, precision, and F1-Score. However, XGBoost outperforms FT-Transformer in all metrics.
  • XGBoost obtains an accuracy of 0.9936, recall of 0.9540 and precision of 0.9838 in distinguishing non-malicious tokens from scams. In contrast, FT-Transformer gets an accuracy of 0.9890, recall of 0.9180 and precision of 0.9752.
    Therefore, from now we will only analyse on XBoost results.

24 Early Method Results

  • For each labelled token, we have computed its features in each of the 24 hours after its pool creation. In this case, we are training both models for each hour. Therefore, we only have one evaluation point for each token.

  • Our algorithm obtains a very high accuracy even in the first hours. However, the precision, recall and f1-score are lower than in Activity based Method. In the best of cases, i.e. 20 hours after the creation of the pool, our best algorithm obtains a recall of 0.789. This could indicate that while malicious tokens are easily detectable in the first few hours, detecting non-malicious tokens require more time.

Discussion and Key Takeaways

  • We provided a theoretical classification to understand the different ways of executing the scam, and through the process of identifying rug pulls we found new token smart contract vulnerabilities (composability attacks) and new ways of money laundering.
  • We provided a methodology to find rug pulls that had already been executed. Not surprisingly, we found that more than the 97,7% of the tokens labelled were rug pulls.
  • We defined two methods that use ML models to distinguish non-malicious tokens from malicious ones. We also verify the high effectiveness of these models in both scenarios.

Implications and Follow-ups

  • In this paper we showed that different machine learning tecniques can be used to detect scams before executing the malicious maneuver without the need of off-chain data.
  • The efficiency and the accuracy of the results could be improved using novel tecniques such as topology data analyisis.
  • Due to the market shifting from Uniswap V2 to Uniswap V3, an obvious follow up would be to study the rug pulls in Uniswap V3 and develop new tools to detect them.

Applicability

  • The algorithm and the methodology provided in the paper could be developed to help and protect uninformed investors in blockchain distributed apps.
  • With help of exchanges, the algorithm produced to detect rug pulls will provided helpful forensic analyisis to detect scammers.
9 Likes

Thank you so much for this fascinating summary. Gosh, 97.7% is a lot of sketchy tokens! Did you get a sense of what volume of activity revolved around scam tokens?

How do you envision this working? A pop-up in Metamask or something like that? How do you distinguish utility tokens or fund-raises?

2 Likes

We computed some heuristics, but it is very tricky to compute robust lower bounds on the actual market volume. We guess (and actually happens in general) that “scammers” wash trade to maximize their profits.

We are thinking something very similar. A pop-up in Metamask could be very useful for an uninformed user. We are also thinking about an API.

6 Likes

This is a fascinating and simultaneously worrying data set. With a percentage of rug pulls that high, was your team able to identify any red flags that were common occurrences in the lead-up to the rug pulls? For example, would a high volume on Uniswap without a presence on other exchanges be a red flag? I am speaking purely from observation, but in light of your data set; did you find any commonalities that traders might use as warning signs a token might have been created with the intention to rug pull?

5 Likes

This is very helpful, @Bruno. What was your sample size? What percentage of the total tokens on Uniswap are rug pulls? Per your data.

4 Likes

I share the concerns of the two previous replies about the method’s “high accuracy” — maybe, dare I say, ML methods are fundamentally inappropriate for determining financial risk on AMMs? It’s a very cool method but equally a dangerous one, as such mistakes could be impactful.

4 Likes