Research Summary - Chainlink: A Decentralized Oracle Network

Zach · December 18, 2020, 11:51pm

TL;DR

Chainlink is a decentralized oracle network that connects smart contracts and their underlying blockchain network to off-chain data resources.
Chainlink also supports on-chain/off-chain data aggregation, a reputation and monitoring framework, as well as a security strategy and future improvements including oracle programming, data source infrastructure modifications, and confidential smart contract execution.

Citation

Ellis, S., Juels, A., & Nazarov, S. (2017, September 4). ChainLink A Decentralized Oracle Network.

Link

Blockchain Oracles for Hybrid Smart Contracts | Chainlink

Core Research Question

How can smart contract applications operating on decentralized blockchain networks connect to external systems and data resources without relying on a centralized oracle service?

Background

Blockchains are distributed networks of nodes that generate consensus on a set of transactions using mechanisms such as Proof of Work or Proof of Stake. Blockchains aim to prevent the “double spending problem” where the same funds are used multiple times.
Smart contracts are pieces of programmatic logic created by users that are hosted on a blockchain network. Every state change is initiated by a private key holder and is both executed and verified across every node in the blockchain network.
Blockchain oracles are off-chain agents that connect on-chain smart contracts to external resources such as data APIs that reside outside of the blockchain network. Such data is often required in the execution of automated smart contract applications.

Summary

Smart contracts replace the need for traditional legal agreements and centrally automated digital agreements. Performance verification and execution of smart contracts rely on manual actions from one of the counterparties or an automated system that programmatically triggers state changes.
Due to their underlying consensus protocols, the blockchain networks that smart contracts operate on cannot natively communicate with external systems, requiring the use of an “oracle” mechanism.
Traditional oracles are centralized services that present a single point of failure in the execution of smart contracts. The authors of this paper present an alternative in the form of Chainlink, a decentralized oracle network.
Chainlink’s components include a simple on-chain data aggregation system, a more efficient off-chain consensus mechanism, the decentralization of data sources and oracles, a validation system, a reputation system, certification service, contract upgrade service, LINK token usage, as well as a security strategy.
Future improvements described include richly featured oracle programming, data-source infrastructure modifications, and confidential smart-contract execution. The paper also describes the technology required for off-chain aggregation as well as the trust assumption of Intel’s SGX.

Method

The on-chain components of Chainlink consist of smart contracts that provide an interface for users to interact with the network. This includes a reputation contract, an order-matching contract, and an aggregating contract. Users choose oracles based on Service Level Agreements (SLAs) that detail the query parameters and the number of oracles needed by the purchaser. The purchaser also specifies the reputation and aggregating contracts to be used.
Once an SLA has been created and the oracles have been selected, each oracle node delivers their response on-chain to the Aggregation contract. This contract tallies the collective results and calculates a weighted answer. The specific aggregation methodology used can be customized by users and can include taking the median, mean, mode, or another method desired.
The off-chain components of Chainlink consist of the oracle node software that is connected to the blockchain and independently listens for queries and responds by generating a transaction to be published on-chain. Chainlink oracle nodes are made of “Chainlink Core” which is the software responsible for interacting with the blockchain, scheduling, and balancing work across various external services. Nodes use job specifications, consisting of subtasks which are processed in a pipeline. Default subtasks include HTTP requests, JSON parsing, and conversion to various blockchain data formats. Additional subtasks can be added through modular external adapters, which connect to external services (such as serverless adapters) via a REST API.

Chainlink oracles perform three key steps: accepting a request (ingesting request details from a user smart contract), obtaining the data (fetch from an off-chain API), and returning the data (once data from an API is received, create a transaction to deliver the data on-chain). While these steps are easy to perform, achieving this process in a secure and tamper-resistant manner is more involved as data sources or oracle nodes might be corrupted. Chainlink solves this issue with decentralization.
Decentralization is achieved at multiple levels. The first is by distributing data sources, meaning each node fetches data from multiple APIs to prevent any single source of truth. Nodes then aggregate this data to generate a single refined data point. The second approach is distributing oracles, where instead of consuming data from one node, data requests are sent to multiple nodes and the data from each is aggregated. This aggregation can occur either on-chain for ease of implementation or off-chain for additional cost-efficiency. Both forms of data source and node decentralization can be used at the same time.

On-chain aggregation of data is where each node delivers their response to an aggregation contract, which then compiles a single reference data point. This aggregation process has the properties of being conceptually simple (involves a single contract), trustworthy (all actions are fully viewable on-chain), and flexible (any aggregation logic can be used). Nodes can also use a commit and reveal scheme to prevent freeloading.
Off-chain aggregation of data saves costs over on-chain aggregation by delivering a single data point on-chain. This is achieved through the use of threshold signatures (Schnorr signatures) where nodes combine their reports off-chain to create a single signature representing every node in the oracle network. Each oracle holds only a partial piece of the private key, meaning the decentralization and tamper-resistance properties are preserved. To prevent freeloading, a consensus algorithm can be used to ensure the user does not pay freeloading oracles.
The paper describes multiple security services Chainlink can employ including a validation system, a reputation system, a certification service, contract-upgrade service, and the LINK token.
- The validation system ensures availability and correctness through the aggregation of data which can occur on-chain or off-chain as described above. Because each node signs their response, it generates non-repudiable evidence of their answer. Nodes who deviate from the aggregated result can then be refused payment.
- The reputation system utilizes the on-chain data each node creates to generate a framework for the reliability of nodes. This includes the total number of assigned/accepted/completed requests, average time to response, and the amount of penalty payments accrued when staking collateral. Users can use this on-chain data to select nodes that have been historically reliable.
- The certification service aims to prevent Sybil and mirroring attacks by issuing endorsements of high-quality oracle providers. This service does not prevent others from running nodes, but provides additional information to users on which nodes have been security reviewed. Fraud detection can be an automated on-chain or off-chain process.
- The contract-upgrade service aims to allow developers to improve their code to mitigate bugs and vulnerabilities. Chainlink supports an additional optional service where nodes can be redirected to delivering their data to a new upgraded user contract if a specific flag is raised. This provides an “escape hatch” but is entirely optional, allowing for immutable contracts as well.
- The Chainlink Network utilizes the LINK token to pay node operators for their services. The amount of LINK required to generate a request is set on a node by node basis, which can be based on supply and demand for oracle services. The LINK token is an ERC20 with additional ERC223 TransferAndCall capabilities, allowing tokens to be received and processed by contracts within a single transaction.
The paper also describes a longer term technical strategy through the usage of trusted hardware such as Intel SGX, enabling confidentiality and off-chain computation. One approach of this is Town Crier, a trusted hardware based oracle (which Chainlink later acquired in 2018). Intel SGX and other forms of Trusted Execution Environments provide a black box where oracle nodes can fetch and compute on data, without revealing the data to the node operator. Town Crier is compatible with existing versions of HTTPS requiring no server side modifications for web servers.
Current existing oracle solutions, at the time of release, were centralized oracle providers which do not provide the tamper-resistant qualities smart contracts need to remain trustless in nature. Often notarization is used, but this cannot be verified on-chain, requiring further recursive validation. Manual input oracles also exist for prediction markets, and provide a large amount of flexibility for hard-to-find information or for tasks that require natural language processing. However, human cognition is costly and slow, meaning manual input oracles are resource-intensive, not real-time, and can only handle a limited set of questions at a time.

Results

This paper introduced Chainlink, a decentralized oracle network for smart contracts to securely interact with resources external to the blockchain. The various components of the network’s construction were described as well as security models and new proposed features that could be implemented into the future. Specific design principles were defined including decentralization for secure and open systems, modularity for flexible system design, and open source for secure extensible systems.

Discussion and Key Takeaways

The Chainlink whitepaper was originally written three years ago. Since then, we have seen the blockchain ecosystem evolve, billions of dollars locked up in Decentralized Finance applications, and the mainnet launch of the Chainlink Network. A review of this whitepaper provides context for the data infrastructure that has shaped the current blockchain landscape and the oracle mechanisms that enable more universally connected smart contracts…

Implications and Follow-ups

For future improvements, the whitepaper describes the creation of an off-chain aggregation mechanism using a Threshold Schnorr signature scheme and the algorithm required for its implementation, particularly in regards to off-chain communication between nodes, along with the accompanying proofs and additional considerations.
Many of the original ideas in the Chainlink whitepaper have been implemented (such as the decentralization of nodes and data sources), while other features have evolved, partially to meet the needs of the Decentralized Finance (DeFi) ecosystem and the demand for shared financial price feeds.

Applicability

Chainlink’s decentralized network of oracle nodes provide smart contracts with enhanced capabilities that enable the creation of a wide range of decentralized applications. Currently such use cases include secure price feeds for the DeFi ecosystem, a verifiable randomness function for on-chain gaming applications, fair sequencing services to prevent miner extractable value, proof of reserve to bring transparency to DeFi collateral, validation of layer 2 Rollup chains like Arbitrum, keeper bots to to trigger smart contract functions on regular intervals, and more. Additional applicability of Chainlink oracles can be found in “77 Smart Contract Use Cases Enabled By Chainlink”.

Eric · December 19, 2020, 5:27pm

What are some of the ways that Chainlink has developed that went as predicted/proposed in the Whitepaper, and what are some ways it has deviated or changed the proposed approach?

Zach · December 20, 2020, 12:19am

@Eric Good questions, most of Chainlink’s proposed functionalities rolled out as expected while a few parts have evolved to meet user needs as the blockchain space has changed quite a bit over the years. DeFi wasn’t a thing when this paper was released in 2017.

The approach to utilizing decentralization at the node operator and data source levels has stayed the same and is now a key piece of the Chainlink’s security. The network also operates today using the on-chain data aggregation method, but with Ethereum’s rising gas costs, there’s been a large development push towards using off-chain data aggregation through an Off-Chain Reporting functionality, which is the first step towards the threshold signatures (signature aggregation) as described in the paper. The development of threshold signatures has also shifted from Schnorr to BLS signatures, which is nice improvement given this is what ETH2 is using as well, lowering the off-chain communication overhead.

I think the biggest change of note is instead of the request and receive SLA model, Chainlink now focuses on the reference network model where multiple users fund and share a common in-demand oracle network like the ETH/USD price feed. This is largely due to the rise of the DeFi ecosystem and its growing demand for market data common for many decentralized applications. I imagine the SLA model will eventually be rolled into the reference feed model at one point as well, given their complementary nature.

I’ve also noticed there’s been much less of a focus on trusted hardware like Intel SGX, likely due to the side channel attack vectors that have been discovered since Intel launched it which has prevented industry adoption. Not a fault of Chainlink but unfortunate nonetheless. Instead Chainlink is now focusing on using different technology known as DECO, which was created by the same academics as Town Crier at Cornell with IC3. Instead of trusted hardware, DECO uses zero knowledge proof cryptography to provide data privacy. It’ll be interesting to see how the network evolves going to into the future and what the Chainlink whitepaper v2 will bring.

Barry · December 23, 2020, 5:42pm

Thanks for the summary. How much exploration has been done into collusion resistance among the node operators? In a scenario where the onchain value secured by the oracle is greater than the fees to node operator network what prevents the operators from publishing a false state?

A feed like ETH/USD might be used by many protocols outside the original sponsor where the value being secured ends up being significantly larger than originally planned. What exploration has been done into risks from free riding protocols?

In general, what research has been done to determine what the optimal compensation should be to the oracle network?

Zach · December 29, 2020, 11:29pm

@Barry I can give my perspective from my analysis of how the Chainlink network operates today and the resources that have been made available. I think this can be broken down into two separate but interrelated subjects; what is the Sybil resistance of the Chainlink network and what are the cryptoeconomic incentives that ensure nodes post honest answers.

Sybil resistance in the Chainlink network today is achieved through the onboarding of security reviewed node operators who make their identity publicly known (can be seen on feeds.chain.link and market.link). The Chainlink team provides these infrastructural and identity reviews, though such security reviews can be performed by anyone. This ensures nodes can be verified by users to be independent entities through their public identity, preventing a single entity who creates many nodes from causing havoc, because while running a Chainlink node is permissionless, it is up to the users themselves to choose which nodes they want to include in their oracle network, where security reviewed nodes will be preferred due to the Sybil resistance created.

The cryptoeconomic incentives of the Chainlink network are an extension of this, but protect against both Sybil attacks and other forms of malicious behavior by node operators, in solo or in collusion. Malicious nodes need to take into consideration that they would not only be removed from the oracle network they fed false data to (and thus lose all future revenue in that network), but would also be removed by any other oracle network on Chainlink they serve (and thus lose all future revenue in the Chainlink network as a whole). This is because the signed data they deliver to contracts is recorded on-chain in an immutable manner and this performance data can be utilized by users to determine when a node within their network is no longer trustworthy. Nodes are also exclusively paid in and accumulate LINK tokens from their oracle services, meaning malicious nodes need to take into consideration that a large enough attack would cause the value of the LINK token to collapse, thus financially harming themselves in the process. This creates what is known as implicit staking as nodes have a financial interest in ensuring the long term health of the Chainlink network as a whole.

Additionally, because each node’s reputation is made publicly available and most node operators are experienced blockchain infrastructure teams or professional data providers, their off-chain reputation is key to their entire business model. Malicious activity in the Chainlink Network would mean they would lose their off-chain business’s revenue through the decrease of user trust, further raising the cost of attack/bribery within the Chainlink network. However that’s not to be said the cryptoeconomic security can’t be improved. Through the addition of staking LINK tokens as collateral, additional skin in the game known as explicit staking is created where stake can be slashed for unwanted behavior such as outlier data or non-responsiveness. Not only would nodes have a direct financial stake in the accuracy and reliability of their oracle services, but they also would have a strong incentive to maintain the value of their LINK tokens while staked.

The opportunity cost of malicious nodes losing future revenue provides long term financial punishment while the slashing of stake provides short term financial punishment, creating a strong all-around cryptoeconomic solution. More information on this subject can be found in a recent presentation from Sergey Nazarov where he covers this subject The Evolution of Smart Contracts and Cryptoeconomic Security - YouTube.

As far as the optimal amount of compensation per oracle network, this really depends on a case by case basis on couple different factors like how often the oracle network generates updates, how many nodes and data sources are used, the blockchain transaction fees that need to be covered even during network congestion, the amount of nodes profits and cryptoeconomic security that is needed, etc. Metrics for amount of revenue generated by nodes can be found on reputation.link and other resources. As staking rolls out in-production, Chainlink’s cryptoeconomic can be quantified further so users can know x amount of payment and y amount of stake required can create z amount of cryptoeconomic security.

Barry · February 6, 2021, 2:28pm

Off-chain aggregation seems like an important area of research not just for oracle networks but any kind of network that uses off-chain consensus but requires strong on-chain guarantees. Governance, network keepers, indexing networks and cross network asset bridging are a few that immediately come to mind.

Is there more extensive research going into it other than moving from Schnorr to BLS-signatures?

The Ethereum community has alluded to eventually moving from BLS to STARK based aggregation for quantum resistance purposed in this primer on BLS-signatures: Pragmatic signature aggregation with BLS - Sharding - Ethereum Research

They also proposed an alternative scheme to BLS which does not require a setup phase among participants: Cryptoeconomic signature aggregation - Sharding - Ethereum Research

Zach · February 6, 2021, 11:03pm

Yes agreed, I think off-chain aggregation has wide reaching implications in terms of not only lowering the gas costs of oracle networks which deliver data to contracts, but also enabling more efficient forms of off-chain computation providing additional scalability advantages.

Recent developments from Chainlink on off-chain aggregation has taken the form of Off-Chain Reporting (OCR), where instead of each node making their own on-chain transaction when an oracle network requires an update, nodes instead first communicate off-chain using a P2P network to combine their responses into a single transaction. This single transaction contains every node’s individual response and signature, lowering the on-chain gas costs of oracle network updates by 80-90%.

As an example, the Chainlink ETH-USD price feed today with 21 oracle nodes requires 21 transactions per update, one per node, with each costing on average 100k gas, leading to a total cost of around 2.1M gas per update. The Chainlink ETH-USD OCR price feed with 31 oracle nodes requires only a single transaction per update, with a total gas cost of around 290k.

OCR doesn’t currently use Threshold Signatures, meaning each signature within the transaction is separate, however this can be added down the line to further reduce on-chain costs, allowing the addition of more oracle nodes without increasing on-chain validation costs. However, considering that OCR already lowers the on-chain gas costs by an order of magnitude, I think threshold signatures may not be as large of a focus for the Chainlink team compared to using these cost savings to launch additional oracle networks.

Barry · February 8, 2021, 1:26pm

This is interesting. It seems the current implementations of bls verifiers on Ethereum consume 140K of gas and can come down if they precompile, however this is a constant cost operation so the number of signatures will not change the verification cost.

On the surface it seems having more nodes will increase decentralization and the desirable properties that come with it. What, if any additional costs would the validator network incur as more nodes participate in a feed that uses a signature aggregation scheme?

Zach · February 8, 2021, 7:13pm

The primary advantage of threshold signatures is indeed the static gas cost of signature verification regardless of the number of nodes in a network. However, the tricky part is maintaining the same level of transparency regarding each node’s individual performance like using separate signatures provides. It becomes more difficult to detect unresponsive nodes or nodes providing outlier data because the threshold signature would be used to sign the final aggregated data point and could have been created by any combination of nodes that meet the threshold.

As a result, there has to be an off-chain reputation system of some kind to track each node’s individual performance. So while the on-chain gas costs would be static, adding more nodes would mean more strain on the reputation system. Additionally, creating a threshold signatures has communication overhead. The more nodes that participate, the more communication that occurs between nodes off-chain (although this cost is far lower than the on-chain costs but it can introduce delays). BLS provides a huge advantage in this regard over Schnorr, but it’s still non-negligible.

Topic		Replies	Views
Research Summary: Chainlink 2.0: Next Steps in the Evolution of Decentralized Oracle Networks Oracles and Data summary , scalability , oracles , privacy	25	19018	October 30, 2023
Research Pulse Issue #9 04/16/21 Research Pulse	2	1944	December 16, 2021
Research Summary - Chainlink Off-Chain Reporting Protocol Oracles and Data summary , scalability , oracles	19	5245	December 17, 2021
Introduction to Oracles and Data Oracles and Data category-intro	2	1235	September 23, 2022
Key Problems in Oracles and Data Oracles and Data key-problems	1	1223	December 1, 2020