A Deepdive into ORAO’s Data Rating System
One of the central pillars of ORAO’s oracle platform is the proactive rating system, which judges the trustworthiness of data as it comes in, rather than only looking at past performance. In this article we will give a detailed overview of how this system works, what it relies on, and how it plays out for data providers and buyers in a few situations.
The Basics — Comparing New Data to Past Data
Almost every oracle platform has some kind of reputation system. Exceptions exist, though they make other tradeoffs such as requiring multiple data providers for each data input, raising costs and severely limiting the kinds of data that their platforms can support.
For those platforms that do have a reputation system, the mechanics are quite straight forward. When you first start supplying data, you are only trusted to the extent that you have a large stake of tokens put down, tokens you lose if you get caught supplying bad data. This gets rid of most bad actors because while you make only a modest profit for providing good data, you lose a lot of money if you are caught misbehaving, easily wiping out months of profit in a few seconds.
However, this leaves one glaring attack vector. If you are a savvy would-be attacker, you can build up a solid reputation score and actively look for opportunities to supply bad data to a particular victim. For example, you could have a DEX that adjusts prices based not just on the proportion of tokens in its liquidity pools, but also based on oracle price feeds. An attacker can then wait until there is a genuine sudden spike up or down in price and provide data saying the opposite happened, or even that price stood largely still. As soon as that data is accepted by the DEX they can then execute an arbitrage trade, trading tokens with the victim at the wrong price, then selling at fair market value elsewhere. For DEXes with large liquidity pools, the profit from such a fraud can be much greater than the cost of having your tokens confiscated for providing false data.
Developers know about this vulnerability, of course, and so some smart contracts tap into multiple data feeds (Though many don’t, because it can be expensive to pay multiple transaction and data fees). However, because even honest data providers often disagree on price by a little bit, especially on time frames of often less than a second, different data feeds generally don’t agree perfectly, and if two signals go in different directions or lag each other smart contracts can’t just lock up and wait, or they would be doing so constantly. As a result, when there is a genuine sudden price change, the vulnerabilities persist.
ORAO’s Proactive Data Rating
ORAO does require data providers to lock in a stake, as well as look at their past performance. However, for reasons shown above we decided that this is not enough. Therefore, we have developed a rating system that makes use of neural nets (machine learning), trained with TensorFlow, which judges data as it is fed into the ORAO network. These neural nets compare your data with data that you have provided in the past, as well as data that other data providers are producing and have delivered in the past.
If you have historically had low signal latency, for example, you don’t get to hide behind “Oh I had bad latency” in the future. Your original high trust score was based in part on your consistent low latency. If your price feed didn’t account for the latest spike when, historically, you should have been among the first to know about it, the rating on your data feed goes way, way down. In clearly malicious or especially egregious cases, the data doesn’t even get delivered, or is published along with a warning not to trust it. The ORAO network is not able to publish data in your name that is not yours — your data is signed by you using your private key, which can’t be faked. But what we can do is attach the data rating to each instance of data being put out.
We have focused on price feeds so far in this article because I expect most readers to be more familiar with price feeds than with other data products. However the same principles apply for other kinds of data. An insurance contract requesting data on flight cancellations is not as time sensitive as a price feed, but nevertheless it must get good data, and our neural nets will compare the flight cancellation data supplied by a seller to the data offered by other providers. To prevent a Sybil attack we take even a small percentage of suppliers disagreeing as a sign that something is up, and bring down the trust score of the data accordingly. Bad suppliers lose their staked tokens quickly enough — by aggressively marking down trust scores at the first sign of trouble, we protect data buyers who would rather be told “Data can’t be fully trusted” than risk getting misled. When purchasing data in our data markets, buyers will be able to specify the minimum required trust score for them to accept a data product.
Lastly, as ORAO is a general data oracle platform, there will be some kinds of data products where a provider’s data cannot meaningfully be compared to past data. The outcome of a hockey match or an e-sports tournament, for example, is a one off event, and doesn’t really flow naturally from past data. So for those kinds of data products ratings are based on whether your data agrees with the data from other providers. For your data to be trusted enough to be considered in the first place though, you will have had to build a long track record of trustworthiness on other data products where we can analyze you more thoroughly.