Web applications have a couple of common use cases for random tokens. For example, e-mail confirmation or password reset e-mails usually have a link that contains a random token. The same goes for “share this item” links in case the item does not have a canonical URL.
The token should be:
- Unpredictable. It would be bad if an attacker was able to guess the token for a password reset e-mail.
- URL-safe. You’re going to embed it in an URL.
How to generate such tokens in Clojure?
Random data
An easy and secure way to generate random data on JVM is to use
java.security.SecureRandom
. On UNIX-y operating systems
SecureRandom uses /dev/urandom
by default, which is great at least on
Linux.
How much random data do you need? In other words, how long should the token be?
- Long eough to not run out of tokens. I’m stating the obivous, but if you’re
going to generate 100k 16-bit tokens, you will generate duplicates as
2^16 = 65536 < 100000
. - Long enough to avoid collisions. Due to birthday paradox, you have
approximately 50% chance of generating a duplicate n-bit token after
generating
2^(n/2)
tokens. If you use 16-bit tokens, this means you have a significant chance of collisions already after generating2^8 = 256
tokens.
You will probably store the tokens in a database with a uniqueness constraint. Having a high chance of collisions will degrade the performance of your systems because you have to regularly retry the generation. Worse, an attacker can generate random tokens themself and try them and they will have a high chance finding one that works.
In 2009, Colin Percival recommended using 256-bit random IDs in his Cryptographic Right Answers. He wrote:
I doubt any application thus far has come close to selecting 2^64 random values; but if computers continue to scale exponentially, this could occur in the upcoming decade. In most applications, using 256-bit random values instead of 128-bit random values carries no significant increase in cost; but it puts randomly finding a collision safely into the realm of “not going to happen with all the computers on Earth in the lifetime of the solar system” problems.
This seems like good advice1 and Latacora concurred in 2018 in their version of Cryptographic Right Answers). Go for 256 bits (32 bytes).
URL-safety
By URL-safe, I mean that you should be able to embed the token into an URL and
it should come out intact after all the encoding and decoding and parsing
involved in handling URLs. My favorite answer for making data URL-safe is using
the URL-safe variant of Base64 encoding without padding. It
encodes arbitrary byte data using lower-case and upper-case letters, numbers,
-
(minus), and _
(underscore). Conveniently Java comes with
java.util.Base64
.
Let’s put it all together
(import 'java.security.SecureRandom 'java.util.Base64)
(let [random (SecureRandom.)
base64 (.withoutPadding (Base64/getUrlEncoder))]
(defn generate-token []
(let [buffer (byte-array 32)]
(.nextBytes random buffer)
(.encodeToString base64 buffer))))
Calling (generate-token)
returns tokens like this:
"EE_jyfwk78cQgCcXkO8CAslDhZOL9T-8v9tHXLadenk"
"ANB9bv2D_jhYZJVoYk0NQvXNSWrrWisKEGEUdeuosIo"
"72mAcjEXWUALSxdmXc0A4jwd51s8t6r-JMmWkFdW868"
"Z3ek1rKEJLexyqx9rwZAmIXEBHphRBFLIK5I1zBhC3s"
"pknMoF8qZFNsq8nu-8Zfv5WOlaejEkvTM2xxSV6tSis"
For something like URL shortener, you may want something shorter and without
ambiguous character combinations like iI1l
or oO0Q
. Otherwise this should be
a good starting point for your random token needs.
Update: If you want to make your tokens even more secure, take a moment to learn about split tokens.
-
Counterpoint: Many programming languages and databases have built-in support for random UUIDs (128 bits, out of which 122 are random), but they do not have equally convenient way of handling 256-bit IDs. ↩︎