Weeknote 1: Schema Evolution

Preface: I’m trying out weeknotes

I believe writing regularly in public about your ideas is valuable. Writing helps you to clarify your thinking and sharing it lets you get feedback. You get something you can refer back to and link to.

I’ve been writing regularly. However, as regular readers may have noticed, I haven’t posted much. I’d like to change that. Instead of trying harder, I’d like to try solving it. A friend suggested posting weeknotes, so here goes.

Weeknotes are weekly updates about what you’re working on. I’m going to post about a software engineering topic that has been on my mind that week. I can’t write in public in detail about what I work on in my job, but at least I can write about the concepts. I’ll also include some non-engineering tidbit or recommendation.

This week: schema evolution

I’ve been thinking about how data models can be changed in a system where you cannot update all the participants at once. A typical example is a backend service that is called by a mobile app. When you change the backend API schema, the already-in-use versions of the mobile app should continue to work. To make it work, your changes have to be backwards and forwards compatible:

Backwards compatible: data written with an old version of the schema can be read with the new version of the schema.
Forwards compatible: data written with the new version of schema can be read with old versions of the schema.

Backwards compatibility is required so that the backend service accepts requests from the old app versions. Forwards compatibility is required so that the old app versions accept responses from the backend service.

What this exactly means depends on how you have implemented everything. For example, maybe your API schema includes a JSON object that contains an optional field name. Can you remove the field?

From the backwards compatibility perspective, it’s okay if your deserialization code ignores unknown fields. If it doesn’t, you’ll get errors about the unknown field name.

From the forwards compatibility perspective, you need to ask what an optional field means. Does it mean that the field can be omitted entirely or does it mean that the field is nullable, i.e. {"name": null} is acceptable? Do you accept both and do they mean the same thing? If the field can be omitted, then the change is okay.

If you just think about JSON, saying that nullable and optional are the same may sound silly. But if you consider how you’d model an optional field with a data validation library like Pydantic in Python, the sensible way is to use a nullable field:

from pydantic import BaseModel

class MyModel(BaseModel):
    name: str | None

Technically you could omit properties from a Python object instance but that would be strange and un-Pythonic.

You could, of course, avoid the whole problem by building a system that translates the data between schema versions. The most ambitious take in this space is Ink & Switch’s Cambria. If anyone is running a system like that at scale, I’d love to hear about it.

Recommendation: Ghosts by Hania Rani

A few years back I moved on from Spotify to buying albums and this means that now I listen to the same albums again and again. Ghosts by Hania Rani is a recent favorite. I like it how her soundscapes are rather abstract, but her singing brings the music back to concrete. It’s a good album to listen to in the morning as the songs have energy but they’re not in your face about it.

Picture: A view from the top of Wank, a mountain near Garmisch-Partenkirchen.