Research Summary: Governing Decentralized Complex Queries Through a DAO

TLDR

  • This paper focuses on improving the efficiency of peer-to-peer file transport protocols. It proposes a Distributed Hash Table (DHT) system structured as a hypercube, a type of network topology that efficiently manages decentralized keyword-based queries executed on data stored in DFS.
  • The authors provide a framework to govern the above network, based on implementing a Decentralized Autonomous Organization (DAO) and smart contracts. This enables organizational decision-making and rewards nodes that have actively contributed to the DHT.

Core Research Question

  • How can we search data stored on novel decentralized storage technologies, such as DLT, DFS, IPFS, by content rather than identifier or location?
  • How do node operators, who positively host and share information to manage rewards and organizational decisions with smart contracts, work in a more scalable and decentralized organization framework?

Citation

Background

  • Distributed Ledger Technologies (DLTs): A peer-to-peer (P2P) system consisting of nodes which maintain information with a consensus mechanism, allowing them to each have the same version of that information. DLT ensures that information stored on it features integrity, immutability, and authenticity.

  • Distributed Hash Table (DHT): a decentralized system for distributed storage, which distributes information by providing a routing mechanism to move resources between system nodes. Its notion is to distribute storage workload among the DHT nodes in a way that allows for fast identification of where the key/value pair resides. DHT nodes are obtained through the use of a hash function, a one-way function that maps any item into a binary sequence of 𝑛 bits. Its key is an 𝑛 bit string obtained after having applied the hash function. Specifically, each DHT is identified itself by an 𝑛 bit ID, which exists in the same ID space used to identify contents. Each node maintains information on corresponding contents that are in a specific ID space interval. DHTs were proposed by “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web”.

File:Hypercube.svg - Wikipedia

Hypercube Structure

This paper proposes a DHT system that is structured as a hypercube. The hypercube is a logical layout where there are 2𝑟 nodes, each one labeled with an 𝑟 bits ID and connected to the 𝑟 nodes whose ID differs by only one bit. Each node is responsible for a specific keywords set, derived from their ID.

  • Decentralized Autonomous Organization (DAO): An organization that “(1) participants maintain direct real-time control of contributed funds and (2) governance rules are formalized, automated and enforced using software”. However, its definition may vary as time goes on, as relevant industrial development may amplify or reconstruct its concept. DAOs aim to be governed by democratic or highly participatory processes or algorithms, the form of DAOs is community-driven using software and following the rule of code.

  • Inter Planetary File System (IPFS): Stores and shares files in a peer-to-peer network. It’s one of the most widely used DFS protocols where files and data are replicated globally on hundreds of nodes in the network. It combines a DHT, an incentivized block exchange, and a self-certifying namespace.

  • IPFS users can only search files stored in the system when owning them or knowing their CID because IPFS files are stored and shared in the form of IPFS objects that are identified by a CID (Content IDentifier), which comes from the outcome of applying that file to a hash function.

  • Content Identifiers (CIDs): Are labels used to point to material in IPFS as objects. CIDs are products of a hash function when applied to a data and used to retrieve the referenced IPFS object.

  • IPFS search engine: A search engine for the IFPS used to overcome the limitation above. It’s used to sniff the DHT gossip, index files, and directory hashes. However, this paper argues that it is too centralized.

  • A decentralized solution called Siva had been proposed that would allow users to search through an inverted index of keywords. However, it does not feature any optimization for a keyword storage structure apart from the use of caching.

  • The Graph network is a “Decentralized Query Protocol”, a system built on the Ethereum blockchain and IPFS, which allows users to search data stored in these two technologies. The organization of the network is like a DAO but has different storage indexes, a method this paper proposes.

  • Decentralized finance (DeFi): Refers to smart-contract-based novel P2P financial infrastructures, that are non-custodial, permissionless, openly verifiable and composable. Anyone can conduct a non-custodial exchange of on-chain digital assets with DeFi protocols. Most DeFi protocol liquidity is provided algorithmically through a simple pricing rule within a smart contract, instead of the bid- and-ask-order prices which traditional finance’s asset liquidity is based on. .

  • On Uniswap, when a new ERC20 token such as the DAO Token is created and initially distributed to its creators, users can provide liquidity by locking up their newly created tokens in a liquidity pool. As an incentive they receive Liquidity Pool (LP) tokens representing their stake. The creators may redeem them at any time by burning the LP tokens, which makes the value of the DAO token highly unstable.

  • The Factory pattern: is an Object-oriented programming design pattern that implements the Factory notion.

  • The EIP-1167 Minimal Proxy pattern: aims to provide a simple and cheap clone contract functionality in an immutable way, by specifying a minimal bytecode implementation that delegates all calls to a known, fixed address.

Summary

  • In a DHT structured as a hypercube topology, let 𝐾 ⊆ 𝑊 represent a keyword set in a keyword space 𝑊. Such set 𝐾 can be used to search for data contents characterized by keywords contained in 𝐾. Specifically, let 𝑂 be a set of data objects referenced in the DHT which are distributed on all the network nodes based on the associated keywords. The node responsible for such a keyword set 𝐾𝑜 maintains all the objects 𝑜 ∈ 𝑂 mapped to a keywords set 𝐾𝑜 ⊆ 𝑊. Then we deem 𝑂 as the set of all the CIDs in IPFS and leverage the DHT to map keywords to IPFS Objects.

  • This paper proposes a system that multiple keywords are stored in IPFS which is structured as a 𝑟-dimensional hypercube 𝐻𝑟 (𝑉 , 𝐸) DHT, where 𝑉 is a set of vertices representing logical network nodes and 𝐸 is a set of edges that connect pairs of neighbor nodes. The node’s ID comes from an 𝑟-bit string associated with the keyword set that the node is responsible for. Each bit position represents a specific keyword. A given keyword set 𝐾 contains a given keyword 𝑘, which is assigned to the 𝑖-th bit of the string by a function. Then, keywords are given in input to an uniform hash function ℎ : 𝑊 → {0, 1, . . . , 𝑟 − 1} that returns positions to set to 1 in 𝑟-bit string, i.e. one(𝑢) = {ℎ(𝑘) | 𝑘 ∈ 𝐾}.

  • Every node will locally store an index table that stores CIDs of all the IPFS Objects associated with the keyword set that the node is responsible for. The associated 𝑟-bit string for a query for a keywords set 𝐾 is used to reach the responsible logical node through a routing mechanism based on the hypercube form.

  • There are two query types: 1) Pin Search, getting all and only the objects that are exactly associated with the keywords set 𝐾, {𝑜 ∈ 𝑂 | 𝐾𝑜 = 𝐾}, that is data objects are retrieved only from one node; 2) Superset Search, getting objects that can be described by keyword sets that include 𝐾, i.e., {𝑜 ∈ 𝑂 | 𝐾𝑜 ⊇ 𝐾}, data objects are retrieved from all nodes that are responsible for a superset of 𝐾. Then, since its possible outcomes can be quite large, a limit 𝑙 on the results is set, that is, search until the number of objects is equal to 𝑙 or no more nodes shall be contacted.

  • The DHT network nodes operators can leverage Ethereum smart contracts to form a DAO characterized following :

    • Token economy - The DAO is built around the use of a unique token that is used for transferring value or for staking purposes. The ERC20 interface presents the functions.
    • Members Registry - Use smart contracts to develop a members registry. It allows token holders to time-lock DAO Tokens and become DAO members.
    • General Voting - Use smart contracts to develop a mechanism that allows all DAO members to make a proposal, submit a suggestion vote, and then decide on a proposal. Each proposal has its own debate period. Any member can vote within that time period. Every member’s vote weight is proportional to the number of tokens locked until a date after the debate period ends.
    • Value Transfer Voting - Using another smart contract, a DAO can develop an extension of the previous voting smart contract to allow a decision that directly executes on-chain.
  • This paper envisions three use cases of a DAO. These use cases are possibly overlapping:

    • DAO use case 1): To prevent instability caused by DAO tokens creators redeeming LP tokens, this paper envisions a use case where the DAO is based on the timelock of the LP tokens obtained by locking DAO tokens in liquidity pools, instead of based on the timelock of the DAO tokens directly. This allows the stability of the DAO to be directly proportional to the value that the DAO tokens can take, and the power of the DAO’s members is directly proportional to the gains/losses they want to make through their behavior.
    • DAO use case 2): With the purpose of assisting “browsable sister networks to the internet”, this paper envisions a use case where different networks implement their own DHT, generating a large number of “islands” where keywords-based queries are possible. They could have different rules but all share a similar querying protocol. They could be built on several DLTs or only one DLT.
    • DAO use case 3): This paper envisions a use case that combines IPFS-search with keyword-based hypercube DHT where “several IPFS nodes crawl the network by monitoring the IPFS logs for new files; these nodes download the files added through the ‘sniffed’ CID; for each file the metadata are extracted and transformed into keywords; the association between the keywords obtained and the CID is stored in the hypercube DHT.”
  • This paper provides experimental validation of implementation results showing how the size of the hypercube and the number of objects stored in the DHT affect the search procedures.

Method

  • By mathematical proof, this paper proposes an optimized routing that is structured as a hypercube DHT for DLT stored data to be searched by its content.
    • Multiple keywords search: The hypercube geometric form has been leveraged to organize the topological structure of a DHT in this paper.
  • By scientific experiment, this paper proposes a test statistic of the efficiency of a DHT software implemented in Python which manages keyword-based queries for contents stored in IPFS, through the use of a hypercube DHT.

Results

  • This paper proposes a five layered (bottom-up) structure. Layers 1 to 4 provide a method for keyword-based search that can be adapted to any type of DFS or DLT for data storage: “(1) First layer: the nodes of the IPFS public network running the standard IPFS protocol. (2) Second layer: all the files that the IPFS nodes keep in their storage. These are indexed using CIDs. (3) Third layer: the files can be described by keywords, which are then used to execute queries and find the files. (4) Fourth layer: these keywords are saved, together with the file association via the CID, using the Hypercube DHT.” The 5th layer is DAO, the governance of the hypercube DHT network.

  • This paper develops smart contracts that implement the proposed framework in Solidity and stores it as Open Source code in Zenodo. Its cost execution in terms of gas for the main operations shows in Table 1. The most expensive operation in the proposed framework is the lockTokens() function. This is because, according to the OpenZeppelin library, each lock request creates a new smart contract that locks tokens for a unique account. This paper uses the EIP-1167 Minimal Proxy pattern to avoid the enormous amount of gas (at least 501818 gas units) the Factory pattern requires.

  • The researchers ran two types of tests, i.e. the Pin Search and the Superset Search, with the network configuration for 8, 16, 32, 64, and 128 nodes and populated the network each time with 10, 100, and 1000 objects generated randomly. Then they repeated 50 random queries for Pin Search and 50 for Superset Search with objects limit set to 10, with a random node and a random keyword set every time. The result shows respectively in Figure 2.

  • Pin Search
    • The test result shows that when the number of objects varies, there will be similar average hops, and when increasing the number of nodes, there will be an increase from 1.28 to 3.52
    • The result corresponds to the researchers’ expectation because “the Pin Search average number of hops should theoretically be with the order of the logarithm of the number of logical nodes, i.e. log(𝑛)/2 or 𝑟/2.”
  • Superset Search
    • The test result shows that when the number of objects increases, the average number of hops decreases, and it increases when the number of nodes increases.
    • The minimum value is 1.36 for 1000 objects and 8 nodes.
    • The maximum is 20.36 for 10 objects and 128 nodes.
    • “Theoretically, the average number of hops should be equal to the average hops required to get to the node responsible for query keywords set 𝐾, i.e. Pin Search log(𝑛)/2, plus the average hops to get from that node to all the nodes that include 𝐾, until the limit of objects 𝑙 (or nodes including 𝐾) is reached.”

Discussion & Key Takeaways

  • The four-layered keyword-based query system this paper proposes can apply to any DLT system and improve its efficiency: “(1) First layer: the nodes of the IPFS public network running the standard IPFS protocol. (2) Second layer: all the files that the IPFS nodes keep in their storage. These are indexed using CIDs. (3) Third layer: the files can be described by keywords, which are then used to execute queries and find the files. (4) Fourth layer: these keywords are saved, together with the file association via the CID, using the Hypercube DHT.”
  • A fifth layer can also add to the above-mentioned four-layer system to enact a DAO, which leverages smart contracts to build functions of token economy, members’ registry, general voting, and value transfer voting.

Implications & Follow-ups

  • The researchers will focus on two future works: 1) Research on the feasibility of a “pay-per-query” model, i.e. the node operators will be rewarded by query granularity; 2) The load balancing issues.

Applicability

  • The hypercube structure routing framework is applicable to any DFS system, such as IFPS, to build a decentralized keyword-based query data storage system.
  • The DAO mechanism which is developed by smart contracts can add to the above hypercube structure routing framework, for node operators who are interested in keeping the network operational and healthy, and, in general, where node operators act in the context of a sharing economy, e.g. Wikipedia editors, to orchestrate their operational decisions and rewards.
3 Likes

@Astrid_CH I’d like to ask a question publicly that you might recognize from one of our conversations on DM, but I thought it might be interesting for others to hear your reply. What was it about this research that you found most compelling?

1 Like

Thanks for the helpful summary of this paper. After reading your summary and skimming the paper, I’m left wondering what advantage a hypercube representation offers? Maybe the hypercube has some search efficiencies or compactness?

We leverage such a protocol and make use of these 𝑟 -bit strings to identify logical nodes in our system, e.g. for 𝑟 = 4, a node that has ID 0100 handles all those the objects 𝑜 ∈ 𝑂 associated to keywords sets whose the one function returns 0100. In the DHT, connections between nodes, i.e. edges, are created among those nodes whose IDs differ of only one bit, e.g. 1011 and 1010

But does this do any better than graph or tree transversal? What about ALL the other requirements, like security?

Worse still, it’s not at all clear what the DAO is supposed to be doing here. They seem to claim some innovation that there is a DAO for governance, but this is clearly just a bolt on and the purpose of this specific DAO is completely undeveloped. (there’s a deep literature on governance that the authors seem unaware of).

Maybe I just don’t get it.

4 Likes

I got the same impression, as I initially assumed the hypercube was being introduced as some form of reference to lattice-based cryptography, but then realized it seemed to be trying to recreate the layer structure of the Internet into a hypercube. This structure seems to be explained as a way to increase the number of edges while trying to optimize the number of hops between edges within the network; while combining the technologies into a novel arrangement.

On the one hand, I can see the desire for the experimentation, but on the other hand I’m not sure the system could be deployed without qualitative research and research concerning the feasibility of this type of incentive mechanism. Beyond the pay-per-query model, as @quinndupont pointed out I’m not certain that the increase in complexity has a correlating increase in security fitness.

I am just trying to glean if they are building oracles and trying to make them seem more futuristic by arbitrarily giving it the hypercube structure in an attempt to reduce the number of hops between queries, while also attempting to add monetized incentive mechanisms within those streamlined steps. There seem to be conflicting interests in the design that don’t seem to necessarily get reconciled, but that might be something that is more apparent in the paper?

3 Likes

@jmcgirk @quinndupont @Larry_Bates Thanks for your insightful comments and questions.

I think the most compelling part I found in this paper is that the authors tried to construct a 𝑟-dimensional hypercube structure to implement complex queries and storage. The authors mentioned in the paper, “The hypercube structure allows to optimize the routing of the queries, by reducing the number of hops needed to locate contents.” When the dimension could be extended, the routing of the queries may be optimized to a higher level. The authors also conducted an experiment that applied this structure to the data marketplace (see Towards Decentralized Complex Queries over Distributed Ledgers: a Data Marketplace Use-case). But maybe this theory needs more experiments and use cases to prove its rightness. Also, I didn’t see they take security issues into the paper’s consideration/discussion.

Second, the idea to govern these nodes by DAO is impressive as well. As an incentive mechanism, nodes earn their DAO tokens according to the level of granularity of the query. To resolve the interest conflicts problem in DeFi fields (but not in this project’s scenario), the authors raised a possible use case where the DAO is based on the timelock of the LP tokens obtained by locking DAO tokens in liquidity pools, rather than the timelock of the DAO token directly. For this project’s scenario, the authors proposed a DAO use case where different networks implement their own DHT with their own organization and rules. I think the interest conflicts in the DAO governance may be raised from the divergence among these DHT organizations’ rules.

2 Likes

Thank you for this response! I think one of the reasons this type of experimentation needs further exploration is that it’s not really clear whether these type of incentive mechanisms actually increase participation. Have you come across any studies that show increased community participation due to the DAO structure? I would not preclude that possibility, but it seems like there is a movement towards this specific structure without the validation that this type of structure is being perceived in the same way as the incentive mechanisms are being presented. In other words, is there any proof that these types of incentive mechanisms have a superior chance of increasing participation than other types of structures?

It is in this context that the potentially decreased security and monetized searches seem to have some conflicting aspects that might completely offset the incentives of payment. Am I misinterpreting somewhere?

2 Likes

I’m genuinely surprised by the fortuitous discussion to have emerged from this particular paper!

@Astrid_CH in your second point, it seems that you are suggesting a kind of quasi-oracle? I’m not sure I understand what these queries are actually supposed to be doing?

3 Likes

Thanks for your question. I seldom saw papers talk about this incentive issue from empirical analysis, I’m also looking for it. Because I’m not an expert in the security domain, I’m not quite sure what specific security issue scenarios would happen. The DAO proposed by the authors aims to reward the nodes, given the lack of details, let me infer the situation… nodes who are currently may not get anything from dealing with the saving and searching process would be more likely to take this job, more nodes may also join as the rewards could be expected. However, the potentially decreased security may reduce the number of people who want to save information, so there may not be so much information to be saved and processed. Consequently, I’m guessing that the number and growth of this DAO/DHT may fall on a balance point, which represents the acceptability of the security and the rewards. Or maybe it’s not the case, I’d really love to discuss this issue, it’s interesting. I think my discussion would be more accurate if you can give me more suspected situations about the security issues.

2 Likes

Thanks for your question. According to the authors’ argument, queries are more effective and efficient when they are being worked in this type of searching process (if you think keywords-based searching, such as a piece of content in the file, is more effective and efficient than searching by respective identifier or location):
A file is saved in the node which is responsible for it, where nodes are structured as a hypercube. “In particular each logical node will locally store an index table where the CIDs of all the IPFS Objects associated to the keywords set it is responsible for are stored.” When a node receives a query request, the node will pass the request to its neighbor node which is nearer the destination node. By iterating the process, the request will end up reaching the destination node.
I’m not quite sure what you are referring to as “quasi-oracle”, does the process above fit the quasi-oracle you mentioned?

2 Likes

So just a quick scenario I can think of based on their structure would be a malicious attacker spamming their network with searches both to simultaneously prevent actual users from being able to search while making the search results less accurate. A sustained attack for 24 hours that made the platform unusable could have a massive impact on user engagement. The immediate impacts would be either the cost of searches going up dramatically or the search engine coming to a halt because of the attacker hogging all the search bandwidth (assuming there is an upper limit that can be hit). In that there is a 51% attack calculator that shows the cost of this type of attack on the networks covered, having a payment-associated validating mechanism that is also attempting to decentralize the node validation makes this system susceptible to a 51% attack in theory. Is there anything to suggest this stack is not going to be vulnerable to a 51% or malicious spam searches?

2 Likes

I think you raised a very insightful issue about this structure, I would not defend it as I think what you said makes sense and I didn’t see any defense material yet. Also, I’m guessing that this issue mainly comes from the node structure, but not the DAO’s or its reward mechanism. Do you think this problem less occurs in IPFS without applying the hypercube structure this paper proposed?

2 Likes