This is a weblog about computing, culture et cetera, by . Read more.

SQL, Clojure, and editor support

When you’re writing code with a modern editor or an IDE, you can count on having a number of convenient features such as syntax highlighting, autocomplete, and code navigation. Unless you’re writing SQL, of course!

A lot of editors and IDEs, such as IntelliJ IDEA, have really nice support for SQL. However, it’s not so easy to benefit from it in practice. If you’re writing Clojure and you want to use SQL to query a database, you have a few options:

  1. Embed SQL into strings in your Clojure code.
  2. Put SQL into .sql files and import them with HugSQL or similar.
  3. Use a low-level query builder such as HoneySQL.
  4. Use a high-level query builder or an ORM such as Toucan.

Let’s look into each of them in more detail.

Embed SQL into strings. A basic query would look like this:

(jdbc/query db ["SELECT title, rating FROM movies WHERE movie_id = ?" movie-id]))

This is simple, but most likely you aren’t getting syntax highlighting or other features for the string. For example, IntelliJ IDEA does support SQL inside Java strings, but it does not work in Cursive. Furthermore, if you’d need to parametrize query in a way that is not allowed by the ? placeholder, you’ll have to resort to string templating, which is prone to SQL injection bugs.

Put SQL into .sql files. With HugSQL, the SQL file would look something like this:

-- :name get-movie-by-id
SELECT title, rating FROM movies WHERE movie_id = :id

When you load this with HugSQL, it defines a function called get-movie-by-id which executes the query.

(hugsql/def-db-fns "my_queries.sql")
(get-movie-by-id db {:id movie-id})

What is great about this approach is that now you get the full editor support for SQL. In Cursive, however, you can’t jump from Clojure code that refers to get-movie-by-id to its definition in SQL. Jumping to definition works with normal Clojure functions, but Cursive does not know how to deal with functions defined by HugSQL.1

HugSQL has advanced support for parametrizing your queries using snippets and Clojure expressions. If you use Clojure expressions, though, you now have the opposite problem: there’s Clojure code embedded into your SQL comments and there’s no editor support for it.

Use a low-level query builder. With HoneySQL, the query would be built like this:

(->> (sql/format {:select [:title :rating]
                  :from   [:movies]
                  :where  [:= :movies.movie_id movie-id]})
     (jdbc/execute db))

There’s a lot to like about this. Your editor’s Clojure support works with this. You won’t get autocomplete for database identifiers, but completion for commonly used keywords can be good enough. Queries are Clojure data, so you can use the full power of Clojure to generate them.

The main problem is that you need to learn a new, non-canonical syntax for your SQL queries. You know how to write the query you want in SQL, but now you need to figure out how to map it to HoneySQL. It shouldn’t be too hard, but over the years me and my colleagues have struggled with it.

Use a high-level query builder or an ORM. With Toucan, the query would look like this:

(defmodel Movie :movies)
(db/select Movie :movie-id movie-id)

From the editor support perspective, this is about the same as using HoneySQL. It does save you quite a bit of boilerplate. Toucan relies on HoneySQL for advanced paremetrization, so the syntax problem remains.

None of the approaches seems like an obvious winner. In practice, every big project I’ve seen has used a mix of them.

Contrast this with Datomic and MongoDB: the query languages of both can be represented cleanly enough as Clojure data and so that’s what you use. This assortment of options does not exist for them because it’s not needed.

In his article Against SQL, Jamie Brandon argues that SQL’s drawbacks cause “a massive drag on quality and innovation in runtime and tooling”. He does not mention editor support, but I can’t help but think that it’s an example of the effects of that drag.

  1. I think jumping to definition works in CIDER and/or Calva, but have not verified it. ↩︎

What does `identical?` do?

Dear fellow Clojure enthusiasts, do you know what the following two code snippets evaluate to? And why is that the result?

(identical? ##NaN ##NaN)
(let [x ##NaN] (identical? x x))

I didn’t know it a couple of days ago and it took me a while to understand it. Now I want to share my understanding.

Go on, make a guess and check it with a REPL. When you’re ready – or if you already saw me post about it on Twitter – scroll past the photos below for an explanation.

The photos in this post are from Lill-Skorvan island near Porkkalanniemi, Finland.

Here are the results on Clojure 1.11.1:

(identical? ##NaN ##NaN)          ; => true
(let [x ##NaN] (identical? x x))  ; => false

At first I thought that this is the NaN != NaN feature1 of IEEE 754 floats, but that is not the case.

clojure.core/identical? checks the reference equality of its arguments. Its docstring says:

Tests if 2 arguments are the same object

##NaN refers to the constant Double/NaN, which is a primitive double. That is, it’s not an object. When a primitive value is passed to a Clojure function as an argument, it gets wrapped into an object. This is called boxing. Concretely this means calling Double/valueOf, which converts the primitive double into a java.lang.Double object.

The two snippets evaluate to different values because in the first snippet ##NaN gets boxed only once, but in the second snippet each function argument is boxed separately. This comes down to implementation details of Clojure. You can see the behavior in the disassembled byte code I posted on Ask Clojure.

When the reference equality of two boxed doubles is compared, they’re considered not equal even if they wrap the same value. This explains the results we saw.

Here’s a bonus exercise: what does this evaluate to and why?

(identical? 1.0 1.0)

I’ve used Clojure for a decade and there are still nooks and crannies I’m not familiar with. I guess it just takes a while to learn a programming language properly.

  1. There are good reasons for the feature, or at least they were good back in the day. A lot of programmers dislike floats, but my hot take is that they’re actually successful solution to a complicated problem. What we should do is to start using decimal floats, which would match programmer intuitions better than binary floats. ↩︎

Tasks of a schema migration tool

The last time I wrote about how database schema migration tools could do more to help us. The way I see it, any schema migration solution has to cover three tasks: creating, managing, and executing migrations.

Creating migrations. The first task is to create a migration script. If your scripts consist of DDL commands in SQL files, you’ll probably write them by hand. A linter like Squawk can help you to ensure the migrations do not cause unnecessary downtime.

Things get more interesting if you can describe your target schema in another programming language or as data. For example, when using Django’s ORM or automigrate for Clojure, the tool generates the migration automatically by comparing the target schema to the current schema. Sometimes you may need to edit the generated migration by hand, but mostly the tool does the job for you.1

Managing migrations. You need to keep track of which migrations have been applied to each of your databases (local, staging, production). When you want to migrate a database, the tool is able to give you a list of required migrations and the order in which they should be performed. The tools tackling this problem end up having essentially same features.

One problem that migration managers need to solve is how to support multiple developers creating migrations at once. If you number your migrations incrementally like Flyway does, migrations in concurrenty-developed branches may end up using the same number.

To avoid this, some tools such as dbmate and Migratus use timestamps in migration names. Ben Kuhn’s sketch of a migration tool has an alternative solution: it uses incrementing numbers, but it provides a rebase command to automatically renumber the migrations based on Git history. That’s a nice touch!

Executing migrations. At the basic level this means running the DDL commands from your SQL files. However, as we learned the last time, you need to be careful to avoid downtime and do things like set lock_timeout in PostgreSQL.

While every tool that manages migrations can also execute them, there are specialist tools such as gh-ost for MySQL which focus only on executing migrations in a robust way without touching any of the other tasks.

Again, not using DDL for migrations seems to open the door for innovation. For example, Reshape for PostgreSQL has a novel way of executing migrations: by using schemas, views, and triggers, it allows you to use both the old and the new schema at the same time. This means that your application does not have to be backwards-and-forwards compatible with database changes.

As seen, there are plenty of tools out there for dealing with these tasks. There’s no one-size-fits-all solution. If your needs are simple, a simple tool will suffice. However, if you’re looking to ensure safety and efficiency, you should look further.

  1. You could describe the target schema with a CREATE TABLE command and have a tool generate the migration by comparing that against the current schema. I’m not aware of such tool, however. ↩︎

For more posts, see archive.