TLDR
- This paper focuses on improving the efficiency of peer-to-peer file transport protocols. It proposes a Distributed Hash Table (DHT) system structured as a hypercube, a type of network topology that efficiently manages decentralized keyword-based queries executed on data stored in DFS.
- The authors provide a framework to govern the above network, based on implementing a Decentralized Autonomous Organization (DAO) and smart contracts. This enables organizational decision-making and rewards nodes that have actively contributed to the DHT.
Core Research Question
- How can we search data stored on novel decentralized storage technologies, such as DLT, DFS, IPFS, by content rather than identifier or location?
- How do node operators, who positively host and share information to manage rewards and organizational decisions with smart contracts, work in a more scalable and decentralized organization framework?
Citation
- Zichichi, M., Serena, L., Ferretti, S., & D’Angelo, G. (2021). Governing Decentralized Complex Queries Through a DAO. arXiv preprint arXiv:2107.06790.
Governing Decentralized Complex Queries Through a DAO
Background
-
Distributed Ledger Technologies (DLTs): A peer-to-peer (P2P) system consisting of nodes which maintain information with a consensus mechanism, allowing them to each have the same version of that information. DLT ensures that information stored on it features integrity, immutability, and authenticity.
-
Distributed Hash Table (DHT): a decentralized system for distributed storage, which distributes information by providing a routing mechanism to move resources between system nodes. Its notion is to distribute storage workload among the DHT nodes in a way that allows for fast identification of where the key/value pair resides. DHT nodes are obtained through the use of a hash function, a one-way function that maps any item into a binary sequence of 𝑛 bits. Its key is an 𝑛 bit string obtained after having applied the hash function. Specifically, each DHT is identified itself by an 𝑛 bit ID, which exists in the same ID space used to identify contents. Each node maintains information on corresponding contents that are in a specific ID space interval. DHTs were proposed by “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web”.
Hypercube Structure
This paper proposes a DHT system that is structured as a hypercube. The hypercube is a logical layout where there are 2𝑟 nodes, each one labeled with an 𝑟 bits ID and connected to the 𝑟 nodes whose ID differs by only one bit. Each node is responsible for a specific keywords set, derived from their ID.
-
Decentralized Autonomous Organization (DAO): An organization that “(1) participants maintain direct real-time control of contributed funds and (2) governance rules are formalized, automated and enforced using software”. However, its definition may vary as time goes on, as relevant industrial development may amplify or reconstruct its concept. DAOs aim to be governed by democratic or highly participatory processes or algorithms, the form of DAOs is community-driven using software and following the rule of code.
-
Inter Planetary File System (IPFS): Stores and shares files in a peer-to-peer network. It’s one of the most widely used DFS protocols where files and data are replicated globally on hundreds of nodes in the network. It combines a DHT, an incentivized block exchange, and a self-certifying namespace.
-
IPFS users can only search files stored in the system when owning them or knowing their CID because IPFS files are stored and shared in the form of IPFS objects that are identified by a CID (Content IDentifier), which comes from the outcome of applying that file to a hash function.
-
Content Identifiers (CIDs): Are labels used to point to material in IPFS as objects. CIDs are products of a hash function when applied to a data and used to retrieve the referenced IPFS object.
-
IPFS search engine: A search engine for the IFPS used to overcome the limitation above. It’s used to sniff the DHT gossip, index files, and directory hashes. However, this paper argues that it is too centralized.
-
A decentralized solution called Siva had been proposed that would allow users to search through an inverted index of keywords. However, it does not feature any optimization for a keyword storage structure apart from the use of caching.
-
The Graph network is a “Decentralized Query Protocol”, a system built on the Ethereum blockchain and IPFS, which allows users to search data stored in these two technologies. The organization of the network is like a DAO but has different storage indexes, a method this paper proposes.
-
Decentralized finance (DeFi): Refers to smart-contract-based novel P2P financial infrastructures, that are non-custodial, permissionless, openly verifiable and composable. Anyone can conduct a non-custodial exchange of on-chain digital assets with DeFi protocols. Most DeFi protocol liquidity is provided algorithmically through a simple pricing rule within a smart contract, instead of the bid- and-ask-order prices which traditional finance’s asset liquidity is based on. .
-
On Uniswap, when a new ERC20 token such as the DAO Token is created and initially distributed to its creators, users can provide liquidity by locking up their newly created tokens in a liquidity pool. As an incentive they receive Liquidity Pool (LP) tokens representing their stake. The creators may redeem them at any time by burning the LP tokens, which makes the value of the DAO token highly unstable.
-
The Factory pattern: is an Object-oriented programming design pattern that implements the Factory notion.
-
The EIP-1167 Minimal Proxy pattern: aims to provide a simple and cheap clone contract functionality in an immutable way, by specifying a minimal bytecode implementation that delegates all calls to a known, fixed address.
Summary
-
In a DHT structured as a hypercube topology, let 𝐾 ⊆ 𝑊 represent a keyword set in a keyword space 𝑊. Such set 𝐾 can be used to search for data contents characterized by keywords contained in 𝐾. Specifically, let 𝑂 be a set of data objects referenced in the DHT which are distributed on all the network nodes based on the associated keywords. The node responsible for such a keyword set 𝐾𝑜 maintains all the objects 𝑜 ∈ 𝑂 mapped to a keywords set 𝐾𝑜 ⊆ 𝑊. Then we deem 𝑂 as the set of all the CIDs in IPFS and leverage the DHT to map keywords to IPFS Objects.
-
This paper proposes a system that multiple keywords are stored in IPFS which is structured as a 𝑟-dimensional hypercube 𝐻𝑟 (𝑉 , 𝐸) DHT, where 𝑉 is a set of vertices representing logical network nodes and 𝐸 is a set of edges that connect pairs of neighbor nodes. The node’s ID comes from an 𝑟-bit string associated with the keyword set that the node is responsible for. Each bit position represents a specific keyword. A given keyword set 𝐾 contains a given keyword 𝑘, which is assigned to the 𝑖-th bit of the string by a function. Then, keywords are given in input to an uniform hash function ℎ : 𝑊 → {0, 1, . . . , 𝑟 − 1} that returns positions to set to 1 in 𝑟-bit string, i.e. one(𝑢) = {ℎ(𝑘) | 𝑘 ∈ 𝐾}.
-
Every node will locally store an index table that stores CIDs of all the IPFS Objects associated with the keyword set that the node is responsible for. The associated 𝑟-bit string for a query for a keywords set 𝐾 is used to reach the responsible logical node through a routing mechanism based on the hypercube form.
-
There are two query types: 1) Pin Search, getting all and only the objects that are exactly associated with the keywords set 𝐾, {𝑜 ∈ 𝑂 | 𝐾𝑜 = 𝐾}, that is data objects are retrieved only from one node; 2) Superset Search, getting objects that can be described by keyword sets that include 𝐾, i.e., {𝑜 ∈ 𝑂 | 𝐾𝑜 ⊇ 𝐾}, data objects are retrieved from all nodes that are responsible for a superset of 𝐾. Then, since its possible outcomes can be quite large, a limit 𝑙 on the results is set, that is, search until the number of objects is equal to 𝑙 or no more nodes shall be contacted.
-
The DHT network nodes operators can leverage Ethereum smart contracts to form a DAO characterized following :
- Token economy - The DAO is built around the use of a unique token that is used for transferring value or for staking purposes. The ERC20 interface presents the functions.
- Members Registry - Use smart contracts to develop a members registry. It allows token holders to time-lock DAO Tokens and become DAO members.
- General Voting - Use smart contracts to develop a mechanism that allows all DAO members to make a proposal, submit a suggestion vote, and then decide on a proposal. Each proposal has its own debate period. Any member can vote within that time period. Every member’s vote weight is proportional to the number of tokens locked until a date after the debate period ends.
- Value Transfer Voting - Using another smart contract, a DAO can develop an extension of the previous voting smart contract to allow a decision that directly executes on-chain.
-
This paper envisions three use cases of a DAO. These use cases are possibly overlapping:
- DAO use case 1): To prevent instability caused by DAO tokens creators redeeming LP tokens, this paper envisions a use case where the DAO is based on the timelock of the LP tokens obtained by locking DAO tokens in liquidity pools, instead of based on the timelock of the DAO tokens directly. This allows the stability of the DAO to be directly proportional to the value that the DAO tokens can take, and the power of the DAO’s members is directly proportional to the gains/losses they want to make through their behavior.
- DAO use case 2): With the purpose of assisting “browsable sister networks to the internet”, this paper envisions a use case where different networks implement their own DHT, generating a large number of “islands” where keywords-based queries are possible. They could have different rules but all share a similar querying protocol. They could be built on several DLTs or only one DLT.
- DAO use case 3): This paper envisions a use case that combines IPFS-search with keyword-based hypercube DHT where “several IPFS nodes crawl the network by monitoring the IPFS logs for new files; these nodes download the files added through the ‘sniffed’ CID; for each file the metadata are extracted and transformed into keywords; the association between the keywords obtained and the CID is stored in the hypercube DHT.”
-
This paper provides experimental validation of implementation results showing how the size of the hypercube and the number of objects stored in the DHT affect the search procedures.
Method
- By mathematical proof, this paper proposes an optimized routing that is structured as a hypercube DHT for DLT stored data to be searched by its content.
- Multiple keywords search: The hypercube geometric form has been leveraged to organize the topological structure of a DHT in this paper.
- By scientific experiment, this paper proposes a test statistic of the efficiency of a DHT software implemented in Python which manages keyword-based queries for contents stored in IPFS, through the use of a hypercube DHT.
Results
- This paper proposes a five layered (bottom-up) structure. Layers 1 to 4 provide a method for keyword-based search that can be adapted to any type of DFS or DLT for data storage: “(1) First layer: the nodes of the IPFS public network running the standard IPFS protocol. (2) Second layer: all the files that the IPFS nodes keep in their storage. These are indexed using CIDs. (3) Third layer: the files can be described by keywords, which are then used to execute queries and find the files. (4) Fourth layer: these keywords are saved, together with the file association via the CID, using the Hypercube DHT.” The 5th layer is DAO, the governance of the hypercube DHT network.
- This paper develops smart contracts that implement the proposed framework in Solidity and stores it as Open Source code in Zenodo. Its cost execution in terms of gas for the main operations shows in Table 1. The most expensive operation in the proposed framework is the lockTokens() function. This is because, according to the OpenZeppelin library, each lock request creates a new smart contract that locks tokens for a unique account. This paper uses the EIP-1167 Minimal Proxy pattern to avoid the enormous amount of gas (at least 501818 gas units) the Factory pattern requires.
- The researchers ran two types of tests, i.e. the Pin Search and the Superset Search, with the network configuration for 8, 16, 32, 64, and 128 nodes and populated the network each time with 10, 100, and 1000 objects generated randomly. Then they repeated 50 random queries for Pin Search and 50 for Superset Search with objects limit set to 10, with a random node and a random keyword set every time. The result shows respectively in Figure 2.
- Pin Search
- The test result shows that when the number of objects varies, there will be similar average hops, and when increasing the number of nodes, there will be an increase from 1.28 to 3.52
- The result corresponds to the researchers’ expectation because “the Pin Search average number of hops should theoretically be with the order of the logarithm of the number of logical nodes, i.e. log(𝑛)/2 or 𝑟/2.”
- Superset Search
- The test result shows that when the number of objects increases, the average number of hops decreases, and it increases when the number of nodes increases.
- The minimum value is 1.36 for 1000 objects and 8 nodes.
- The maximum is 20.36 for 10 objects and 128 nodes.
- “Theoretically, the average number of hops should be equal to the average hops required to get to the node responsible for query keywords set 𝐾, i.e. Pin Search log(𝑛)/2, plus the average hops to get from that node to all the nodes that include 𝐾, until the limit of objects 𝑙 (or nodes including 𝐾) is reached.”
Discussion & Key Takeaways
- The four-layered keyword-based query system this paper proposes can apply to any DLT system and improve its efficiency: “(1) First layer: the nodes of the IPFS public network running the standard IPFS protocol. (2) Second layer: all the files that the IPFS nodes keep in their storage. These are indexed using CIDs. (3) Third layer: the files can be described by keywords, which are then used to execute queries and find the files. (4) Fourth layer: these keywords are saved, together with the file association via the CID, using the Hypercube DHT.”
- A fifth layer can also add to the above-mentioned four-layer system to enact a DAO, which leverages smart contracts to build functions of token economy, members’ registry, general voting, and value transfer voting.
Implications & Follow-ups
- The researchers will focus on two future works: 1) Research on the feasibility of a “pay-per-query” model, i.e. the node operators will be rewarded by query granularity; 2) The load balancing issues.
Applicability
- The hypercube structure routing framework is applicable to any DFS system, such as IFPS, to build a decentralized keyword-based query data storage system.
- The DAO mechanism which is developed by smart contracts can add to the above hypercube structure routing framework, for node operators who are interested in keeping the network operational and healthy, and, in general, where node operators act in the context of a sharing economy, e.g. Wikipedia editors, to orchestrate their operational decisions and rewards.