TLDR
- The researcher proposes a framework for open-access decentralized machine learning algorithms that have shared databases and frameworks without centralizing data processing mechanisms.
- He proposes a collaborative trainer, an incentive mechanism, and an additional automated data handler that can be stacked to create a superior final model for a machine learning process.
- The researcher suggests that by creating blockchain-based incentives for contributing to data sets, it may be possible to improve the public’s access to high-quality data for improved machine learning
Core Research Question
Can a model for building decentralized machine learning (ML) algorithms that can help accelerate the evolution of AI be designed? What type of model for building decentralized ML algorithms would democratize artificial intelligence (AI), keep costs low, and keep its data updated and relevant?
Citation
Microsoft, J. D. H., Senior Software Developer at. (2021). Leveraging Blockchain for Greater Accessibility of Machine Learning Models. Stanford Journal of Blockchain Law & Policy. Retrieved from https://stanford-jblp.pubpub.org/pub/blockchain-machine-learning
Background
- The researcher proposes an open-access machine learning framework in which collaborators are rewarded for contributing data that is deemed “good” whereas those who have contributed data deemed “bad” lose their contribution fee.
- The researcher suggests that including a smart contract in the process of deploying machine learning training algorithms could be more secure than a purely open-source system which does not include incentive mechanisms in the design.
- He suggests that a small fee for contributing data combined with a reward mechanism for data that is validated as good can lead to improved ML algorithmic accuracy.
- The researcher uses the Internet Movie Data Base (IMDB) as a test set from which to assess the data and classify sentiment.
- The original framework was proposed in 2019 at an IEEE conference on Blockchain
- Perceptron: a linear machine learning algorithm used for binary classification (i.e. whether a film was given a positive or negative review). The model uses linear and logistic regression to take weighted data and break it into two classes.
- IMDB Sentiment Classification Data set: A list of 25,000 movies labeled by sentiment (positive/negative) IMDB movie review sentiment classification dataset (keras.io)
- Demo of Incentive Mechanism: Decentralized & Collaborative AI on Blockchain Setup + Demo - YouTube
Summary
Adding data to a model in the Decentralized & Collaborative AI on Blockchain framework consists of three steps: (1) The incentive mechanism, designed to encourage the distribution of “good” data, validates the transaction, for instance, requiring a “stake” or monetary deposit. (2) The data handler stores data and metadata onto the blockchain. (3) The machine learning model is updated.
Method
- A Perceptron (an an algorithm used for supervised learning of binary classifiers) is used to train a model on IMDB data for sentiment classification
- Approximately 8,000 training samples are used in the first simulation and 33,000 total training samples are used in the second simulation
Results
- The first simulation is conducted with the condition that users contributing data that is confirmed to be “bad” lose their submission fee.
- Lost submission fees are split among the contributors of “good data”
- The balance represents starting funds contributed to the pool to be able to update the algorithm.
- The initial decline in balances represents the accumulated fees being paid into the pool, while the uptick in good balances is the accumulation of rewards from bad submissions paid out to users that submit good updates to the algorithm.
- In this simulation, a contributor that has been found to have made a “good” contribution has their fee returned and a point added to their reputation score.
- This simulation shows a positive increase in the accuracy of the sentiment analysis over time.
- The second simulation adds 25,000 more training samples to the data set
- The second simulation operates under the premise that a nefarious actor is intentionally trying to add “bad” data to the test sets actively paying for the attack
- Even under the condition that an attacker is willing to pay to inject “bad” data, the “good” data contributions offsets the attacker and the accuracy of the ML algorithm does not drop.
- The tests took place in August of 2020; the researcher estimated the cost of each update to the algorithm to be roughly $.40 USD.
Discussion and Key Takeaways
- The researcher suggests that open-access machine learning on blockchains have great potential for incentive mechanisms to improve the efficiency of training algorithms
- The researcher also suggests that the costs associated with updating the algorithms may be used to dissuade nefarious attackers and may eventually stop attackers when they run out of money to attack a specific object.
Implications and Follow-ups
- Based on a rough estimate of price change since the publication of the article, the current cost of updating the training algorithm is approximately $2.88 USD under the assumption that the researcher is unsuccessful in decreasing the average cost of pushing an update
- The researchers address frequently asked questions about their publication at the following link: https://github.com/microsoft/0xDeCA10B/blob/main/README.md#faqconcerns
Applicability
- Open-access machine learning algorithms may be beneficial for training AI assistants more quickly.
- AI depends on machine learning algorithms, and more accessible algorithms would translate into AI that learns more quickly.
- The nature of ML algorithms depends on perpetual training, which is partially the reason why the researcher looks for methods of making the long-term updating of a training algorithm as inexpensive as possible for contributors with a history of “good” data training submissions