Weeknote 13: Deterministic Simulation Testing

This Tuesday was the day of the first Systems from HEL meetup. It’s a meetup about systems programming – there’s a bunch of systems meetups in the US and now there’s one in Helsinki, too.

Pekka Enberg, the founder of the meetup, was also the first person to give a presentation. He talked about deterministic simulation testing (DST). He gave an overview of the technique and demoed his prototype implementation penberg/hiisi. Pekka’s company Turso has recently started using DST in anger and he shared some lessons they had learned. Pekka’s talk can be viewed on YouTube.

A problem with systems like databases is that there are these bugs that are really difficult to trigger. Sometimes you need to do things in just the right order to trigger a bug, but in real world systems there are many sources of non-determinism: file and network I/O can be slow or fail, threads can get scheduled in different order, etc. This kind of bugs may get triggered in production, but it’s diffult to debug them because you can’t reliably reproduce them.

DST’s answer to the problem is simple: take control of all sources of non-determinism and make them deterministic in the testing environment. If you abstract away the calls to a file system, the network, or a time source, you can create a simulator runtime that can deterministically mock the results and inject faults.

Just like in property-based testing, you can use a pseudo-random number generator (PRNG) to to generate the results and the faults. You can also use it to generate the inputs to the system such as client calls. If you re-run the test with the same PRNG seed, you should get the same results – now you can debug it. By running the system with a lot of random PRNG seeds, you get a good chance of triggering rare bugs.

DST in practice

The trouble with DST that it’s difficult to pull off. The biggest takeaway for me from Pekka’s experiences was that you don’t have to go all in to get benefits. Controlling every source of non-determinism is a lot of work, but tackling even some of them lets you find bugs. At Turso, their experience was that every time they have taught the simulator new tricks, they have found new bugs.

If you think it sounds like fuzzing and chaos testing in addition to property-based testing, yup, you’re right. It combines ideas from all of them.

Historically FoundationDB pioneered the technique about a decade ago. Right now the some of the same people are pushing the envelope with Anthithesis, a general-purpose simulator testing platform. They have gone as far as developing a deterministic hypervisor. Another well-known implementer of DST is TigerBeetle, a financial database.

In conclusion

The presentation was interesting and there was plenty of questions during and after the presentation. Great discussion altogether! I’m looking forward for the next meetup.

Photo: Overripe berries of lily-of-the-valley in autumn sun. I was going to take a photo of the presentation for this post but I was following it so closely that I forgot!