Essential features of data specification libraries

A look towards the sky through the pipes of Sibelius Monument

Last week I took a look at three data specification libariers in Clojure: Schema, Spec, and Malli. This week, let’s talk about the essential features of these libraries.

Data specification language. This is the foundation of every data specification library: a way to describe the data with a schema. You could use an existing language such as JSON Schema, but in practice that’s clunky and everybody develops their own language.

;; Let's model users. We want to know the user's name and the year of birth.

;; Schema
(require '[schema.core :as schema])
(def User {:name schema/Str, :year-of-birth schema/Int})

;; Spec
(require '[clojure.spec.alpha :as spec])
(spec/def ::year-of-birth int?)
(spec/def ::name string?)
(spec/def ::user (spec/keys :req-un [::name ::year-of-birth]))

Validation. The basic operation is to validate data against a schema. What is interesting is what happens when the validation fails. Easy-to-read error messages are essential for the developers, but having errors as data is a useful building block, too. For example, you could build front-end form validation on top of a data specification library, and then you’d know what was the problem and in which field.

;; Our user data example is missing the year of birth. Who gives out their
;; real year of birth, anyway, to the services we (the software industry) build?
(def a-user {:name "", :year-of-birth nil})

(schema/validate User a-user)
;; Execution error (ExceptionInfo) at schema.core/validator$fn (core.clj:155).
;; Value does not match schema: {:year-of-birth (not (integer? nil))}

(spec/valid? ::user a-user)
;; => false

(spec/explain ::user a-user)
;; nil - failed: int? in: [:year-of-birth] at: [:year-of-birth] spec: :user/year-of-birth

Conforming. This feature is only implemented by Spec, but the idea is more general. A data specification language allows you to declare a grammar for your data structure. Conforming is parsing your data against that grammar. The result is akin to an abstract syntax tree, and Spec calls the conversion in the other direction “unforming”.

;; The regular expression specs and spec/or reveal the power of conforming.
;; Let's model hiccup-style HTML data.
(spec/def ::hiccup
  (spec/cat :tag keyword?
            :options (spec/? map?),
            :children (spec/* (spec/or :tag ::hiccup, :text string?))))

;; Now let's parse an anchor tag into a neat map.
  [:a {:href ""} "Check out " [:i ""]])
;; {:tag :a,
;;  :options {:href ""},
;;  :children [[:text "Check out "]
;;             [:tag {:tag :i,
;;                    :children [[:text ""]]}]]}

Coercion. Coercion, or more generally, schema-driven transformation of data means that you walk your data and schema together and apply transformations to build new data. This allows you to do things like converting date strings in JSON to java.time instants and vice versa.

Program instrumentation. Yet another use case for data validation is to define a contract for a function: what kind of inputs it takes, what kind of data it returns, and how these are related. This is what Spec’s fdef does.

(require '[clojure.spec.test.alpha :as stest])

(defn plus [x y] (+ x y))
(spec/fdef plus :args (spec/cat :x int? :y int?) :ret int?)
(stest/instrument `plus)

(plus 1 2)
;; => 3

(plus 1 "2")
;; Execution error - invalid arguments to user/plus at (REPL:1).
;; "2" - failed: int? at: [:y]

Schema introspection. The ability to inspect the schema enables features such as generating JSON Schema from your schemas, as shown by spec-tools.

Performance. Data specification libraries can end up playing a pretty important role in your application. For REST APIs, JSON coercion is part of every request, and if you use instrumentation, validation is literally everywhere.

In my experience, indiscriminate use of Spec’s instrumentation makes programs crawl. Michael Borkent shared a similar experience on The REPL podcast:

If you have an application and you have like 100 core specs and then you start instrumenting those, the application becomes really really slow, even in just in development. So the overhead of calling spec validations on every function call in Clojure program becomes too much for core functions. That is what I had found and I was a little bit disappointed that it didn’t work out, so, at least for dev purpose.

The performance matters and your data specification library can become a bottleneck.

As far as I know, no data specification library for Clojure has all of the features mentioned above. Hopefully this list helps you to choose one – or to decide to roll your own!

Comments or questions? Send me an e-mail.