SCRF Interviews | Computational Science in Web3 - Danilo Bernardineli and Jeff Emmett (Ep. 15)

Part 3 of our 7-part series with the team at BlockScience features a conversation with Senior Engineer Danilo Bernardineli and Communications Lead Jeff Emmett. They delve into:

  • The scientific method
  • The role of models in data
  • The computational (or generative) approaches to scientific investigations
  • When to use computational methods versus analytical and empirical methods
  • Building computational simulations themselves, and
  • Using computational simulations to explore uncertainties and help inform decisions

Danilo Bernardineli (DB ) is a Senior Engineer at BlockScience. He studied atmospheric physics at the University of SĂŁo Paulo in Brazil, and is the in-house expert at BlockScience in computational (or generative) science. A veteran modeler, he quickly saw that many of the things he knew about meteorological dynamics were relevant to the intricacies of token engineering.

Jeff Emmett (JE ) is the Communications Lead at BlockScience. He is an electrical engineer by training who became fascinated by token engineering and realized, like Danilo, that many of the theories and techniques of “real-world” engineering could be ported into the new design paradigm of token engineering in web3 and beyond.

The interview was conducted by Eugene Leventhal, Executive Director of SCRF.

Audio: Spotify, Apple, Spreaker

Key takeaways from the interview:

Let’s start with the scientific method and the role of models and data. How does the scientific method look to you in the context of web3? Are there any interesting quirks that come up when you port your knowledge of physics to the world of token engineering?

DB : Let’s first unpack what computational science means. In order to create a hypothesis, the scientist needs formal models of how something works. These models often exist in a hierarchy: simple physical models at the bottom and much more abstract models as you go up the hierarchy.

In the world of web3, things are on-chain and there is a permanent ledger that provides solid data availability. Does that make going about these computations any different than in other domains, or is there always a similar challenge when building out the models?

DB : The world of web3 and crypto have peculiarities compared to other domains. The first one is the public blockchain. Most of the interaction data is available, and this is a huge differentiator. For example, if you look at research in macroeconomics, people always get worried when they see how GDP is calculated. It’s hard work and you need to make a lot of assumptions. Crypto is different. There is no socio-economic system that has the level of data that crypto offers.

Second, not only is the data available in crypto systems, but also the source code of the smart contracts is a huge dataset on its own. Very few people are capable of exploring that, and thus they don’t understand the processes that are taking place. Everybody knows that a specific transaction occurs, but the dataset used to transform the elements of that transaction remains a mystery.

Third, the number of people doing research on crypto is relatively small compared to other fields. Serious research in crypto, especially in terms of economics, probably got started around the time that BlockScience was founded, so that’s only four or five years ago. Compare that with other fields in economics where research has been going on for centuries, or climate science where thousands of people have been working on the problems for decades. And yet when you make those comparisons, research in crypto has been surprisingly productive.

Finally, most of the math used in crypto is similar to that used in traditional finance or climate science. But because the crypto field is so new and dynamic, the problem statements and the set of ingredients researchers work with tend to change in a short period of time. Whereas in climate science, the problems being worked on have taken a long time to develop and aren’t going to change overnight. The issue of how much carbon is going to be in the atmosphere has been a frozen problem for decades now and will remain so for decades to come. So having a frozen problem definitely helps.

What does it actually take to build one of these large-scale computational simulations?

DB : This is a question with many potential answers. First, we have to understand why we would want to build a simulation. There are two approaches to that. One is the exploratory approach, and the other is the scientific approach.

In the scientific approach, the researcher starts with a hypothesis, but either there isn’t any data, or the data is not directly related to the problem. So they build simulations that include their beliefs, and then generate data that does or does not support the hypothesis.

In the exploratory approach, there is no specific hypothesis. The researcher simply describes the system. In order to build a simulation there has to be a mental model. How detailed is the model? What are the specific features of it? It might be inspired by pure intuition, which is comparable to going on an unplanned hike where you might get lost in the woods. Or it might be a case where the researcher plans their route. They know their starting point, where they want to end up, and the simulation guides them to their destination.

Those are very general answers. Sometimes people ask that question because they’re looking for specific tools. Often these people are coders who want to know what particular coding style they need to use. They’re looking for a model to inspire them. Maybe it’s a specific mathematical description, maybe it’s just a vague shape in their minds. This is a problem that can have several layers of answers, but the most common approach is to start by copying an existing model and adapting it to create something totally new.

JE : This field is in the very early stages of modeling and exploring these types of systems. What BlockScience is aiming toward is building a library of models that will enable a whole community of people, not all of them engineers like Danilo and Dr. Michael Zargham (founder and CEO of BlockScience), who can take a model off the shelf, tweak the parameters, and run it for their specific context, rather than having to build it from the ground up, which is orders of magnitude more difficult. Creating tools that don’t exist is a very different thing from simply choosing the right tool from an existing toolset.

When is it good to use computational methods versus analytical or empirical methods? Are there any particular pros and cons to going one way or the other?

DB : This gets us back to the question of “how does one do science?” Science begins with a model of the world and a hypothesis. In order to arrive at a conclusion, the scientist needs data. Traditionally, the methods for doing that are analytical and empirical. The computational method is relatively new, maybe three or four decades old.

The empirical method is still good. Researchers conduct experiments and gather data directly from the real world. The analytical method is math-based, and the researcher works by deduction. They stay with what is feasible, and there is usually an obvious option. And finally, the computational method generates the data itself from an underlying model.

The reason we didn’t have the computational approach before was simply that it was too expensive to generate data that way. The computational method does contain elements of the analytical method, because in order to use that approach, the researcher has to be able to formalize.

The crypto space lends itself to the computational approach, because for most of the analytical questions, you can’t use the empirical approach because it’s very expensive. Analytical questions, on the other hand, require expert labor; they’re very time-consuming, and they have no guarantee of success. Using the computational approach, the researcher formalizes their model and runs many simulations. They may not get a perfect final answer, but they’ll get an answer that is very, very close. And for many problems, this is all that is needed.

Computational methods clearly have benefits when there isn’t sufficient data but researchers can generate “close to perfect” data from a model. But that implies the existence of a mature model. What is the dependence on project maturity?

DB : Maturity is always a big question when using models. Speaking of web3 applications, there is a big difference between using computational models to design new projects, validate projects, or diagnose problems with existing projects. When designing new projects, it really amounts to conducting a sanity check, making sure that certain laws are never broken. Computational models allow researchers to express what they know about a system, to create fictional scenarios that they can test a priori to allow them to know if they should change the design or not.

Validation is a different thing. With validation, researchers believe they know the system and how it should work. Since it is already deployed, they can easily compare what they believe to what is actually happening. This is important, because in complex systems, there are plenty of opportunities for bugs.

The third scenario, diagnosing problems with existing projects, is very interesting. Having computational models allows researchers to do “grey-box learning.” They have the present trajectory and want to know the future trajectory. Given that the variables are not completely independent, by having the computational model and explicitly representing the relationships with variables, they can generate future trajectories that are much more accurate.

Sometimes in a token project you have a numerical parameter, and two months later you want to change it two months from now, or you want to put a new mechanism in place that’s going to increase or decrease the lock-up of something. The grey-box model allows researchers to compute what’s going to change when they change a mechanism inside it. It’s especially useful for governance problems.

Can you provide a concrete example that would help the audience understand what this computational science looks like when it’s actually applied?

DB : I’m always biased toward FileCoin (a distributed cloud storage network for data), which BlockScience has been supporting for three or four years. Outsiders always comment that FileCoin seems so complex, but it’s not. It’s just a very big project made up of very simple pieces. There are several very innovative concepts that they use, like the math behind deciding how much they should lock up in collateral. We had a computational model for FileCoin where we extrapolated some of those elements out 10 or 15 years into the future. There were several KPIs that were design goals, such as the incentive to participate in the network, or the ability to resist network attacks.

Once FileCoin was launched, the key concerns were things like: What would happen if we added a new mechanism? What would happen if we changed a certain parameter’s value? How would we know if the effect was going to be good or bad? Because we had simulations for the future, we were able to extrapolate answers to those kinds of questions. With FileCoin we were talking about millions of simulations for all kinds of scenarios: price scenarios, supply and demand scenarios.

When you’re working with simulations, you’re actually creating fictional worlds. You’re imagining hypothetical futures. But how do you define if these futures are good or bad? This is important, because in the end we must select something. With FileCoin we had a group of stakeholders with different goals, so we mapped each goal to various KPIs, and each KPI was computed for various trajectories.

Sometimes the possible trajectories are anti-correlated. Sometimes you want to map for robustness, other times for a specific outcome. Sometimes you want to maximize for security, other times for convenience. Non-linearity plays a role. Is a given trajectory going to be four times more inconvenient, or just 10% more inconvenient?

When you have a complex system, there may be dozens of parameters that you can vary and set up differently. In one scenario, for example, doubling the security may also double the inconvenience. But in another scenario, doubling the security doesn’t change the inconvenience at all. In one scenario, increasing robustness may mean that you’re putting the user near a cliff of some kind. But in another scenario, adjusting the levers a different way can increase the robustness without threatening the user at all.

How do you know that you didn’t miss a variable? And what happens if you do miss a key element somewhere?

DB : One thing that helps a lot is the coupling between what data is being generated, like a model of the world, and the KPIs themselves. The model generates synthetic data. Analyzing data generated by the model is quite similar to analyzing data from the blockchain itself. The difference, of course, is that the blockchain has one trajectory, and the simulation could have millions of trajectories. So there is a bit of a challenge there.

How do we know that we’re not missing anything? First, the generating process is a feedback-heavy process. You’re not working in a dark room. You’re in constant conversation with the stakeholders. Most of the time, what you want is to know that your KPIs are actually measuring people’s desires. Of course, it’s very hard to pare down people’s desires, so sometimes we have multiple numbers for the same desires. Let’s say we have a goal like “the convenience of participating in the network.” Sometimes, the KPI reflects how profitable you can be at a certain setting. Other times, it reflects the likelihood that something bad will happen and users will lose something. You have to be able to put weights on these things, but also have uncertainty in the weights themselves.

JE : At this point, I’d like to double-down on the recursive nature, or the iterative nature, of engineering design. The process that Danilo has been discussing is actually a multi-stage process. There are several feedback loops that “step you back” to the previous loop to make sure that the assumptions you made are correct.

As Danilo said, this means going back to the stakeholders and making sure that the KPIs that were originally identified are the ones that are truly important. There may be new ones to add along the way, there may be other complications that come up that make stakeholders realize there’s more to the story than they originally assumed. Engineering anything has always been a circular process with many iterative loops.