At the beginning there was the problem

Clojure being a dynamically typed language, I often find myself in trouble with subjectively more complex applications. In my experience writing one in the first place is fairly easy compared to other programming languages due to the way one can rapidly prototype new features. And while that certainly is a big plus, it does nurture growth of complexity. Multiple new functions chewing away at your data is just asking for bugs to appear somewhere down the stack.

Writing tests can certainly increase the stability of a complex system, it does however not help a client (i.e. you) in using it. So, while I do enjoy clojure for being easy to develop in, I sometimes wish it had more support for enforcing types upon the user of any function (again: myself!) – very much like every statically typed language does.

I’ve worked around this limitation by writing assertions on input arguments to a critical function. With critical I really mean those functions which get called a lot all over the place and are thus prone to get faulty data in easily due to bugs. Writing assertions myself is tedious though. The hardest part is thinking about a descriptive error message – which often times leads me to use the terrible built-in assert macro. Whoever included that one has been bitten by C one too many times. And there is also the esoteric issue with imperative assertion code cluttering your pure functional code.

What to do? Argument type deduction and macros to the rescue. Read on.

Then: Some space action!

Before talking about deduction, we need an example. Imagine yourself writing an action packed space shooter. Surely you have become a programmer because you started dreaming about writing your own games, no? In any case everything space related should be there, but for this example, lets stick with space ships. They should have awesome weapons capable to wreak havok on your screen blinding any player with their awesomeness. Of course we need a way to model a ship actually shooting by generating a projectile:

(defn gen-projectile
  "Given a ship and a weapon id, generate a projectile and return
  [ship' projectile]. Where ship' is the modified entity."
  [ship weapon-id]
  [ship {:dmg 10 :weapon-id weapon-id :type :projectile}])

While it is obvious from the argument list that the function is supposed to be passed some ship and a weapon identification we still have to check for correctness ourselves:

(defn gen-projectile
  "Given a ship and a weapon id, generate a projectile and return
  [ship' projectile]. Where ship' is the modified entity."
  [ship weapon-id]
  (assert (ship? entity))
  (assert (weapon-id? weapon-id)) 
  [ship {:dmg 10 :weapon-id weapon-id :type :projectile}])

Its obvious though. To us at least. Of course the parameter should be a ship, thats why it is named that way, no? Same goes for the weapon-id. So why not have it be checked automatically?

Argument type deduction

So how is it that we know the type of the ship argument? It is because we deduce it from the name. Consider the following deductions from argument names (as symbols) to types:

Symbol Type Symbol Type
num Long1 speed Double1
ship Ship transform-fn Function
seconds Double1 my-map Map
image-count Long1 obj Object

If there is a way to specify a mapping from argument names to types, then we can automate the process of adding validations.

Thus, what we need are two things:

  1. A way to map argument names to types.
  2. A way to automatically add validations to user code.

Mapping arguments to types

Mapping argument names to types is fairly straightforward when using clojure multimethods:

(defrecord Ship [])

(defmulti deduce-argument-type-from-symbol identity)

(defmethod deduce-argument-type-from-symbol 'ship [t]

That was easy enough. Pass it the symbol ‘ship and you get the corresponding class:

user> (deduce-argument-type-from-symbol 'ship)

The above is fairly static though. It would not allow us to deduce arguments based on a suffix, like in “image-count“. A better way is to use regular expressions. In order to still be able to use multimethods, we need to do the dispatch ourselves however.

(defrecord Ship [])

(defmulti deduce-argument-type-from-symbol
  ;; Our custom dispatch function
  (fn [sym]
    ;; we take all of the defined methods and look at which dispatch
    ;; value (a regex) matches the symbol name.
      (fn [[regex method-fn]]
        (re-matches regex (name sym)))
      (methods deduce-argument-type-from-symbol)))

(defmethod deduce-argument-type-from-symbol #"ship" [_]

(defmethod deduce-argument-type-from-symbol #"^[a-z]+-count" [_]

Try it out with:

user> (deduce-argument-type-from-symbol 'ship)
user> (deduce-argument-type-from-symbol 'image-count)

So far so good. But as it is now, we would clutter the deduce-argument-type-from-symbol multimethod with deductions from every place that defines its own deductions. The very reason why namespaces exists, is to allow library writers to use their own global variables without having to worry about name collisions.

What we need is another parameter for disambiguating namespaces. But it would be nice to keep the possibility of adding global deductions – which is why we’ll add another multimethod. It will dispatch both on the namespace and the symbol:

(defmulti deduce-argument-type-from-symbol-on-ns
  (fn [ns sym]
    ;; we take all of the defined methods and look at which dispatch
    ;; value (a regex) matches the symbol name.
      (fn [[[method-ns regex] method-fn]]
        (and (= method-ns ns) (re-matches regex (name sym))))
      (methods deduce-argument-type-from-symbol-on-ns)))

(defmethod deduce-argument-type-from-symbol-on-ns [*ns* #"ship"] [_ _]

(defmethod deduce-argument-type-from-symbol-on-ns [*ns* #"image-count"] [_ _]

Lets try that out again:

user> (deduce-argument-type-from-symbol-on-ns *ns* 'ship)
user> (deduce-argument-type-from-symbol-on-ns (find-ns 'clojure.core) 'ship)
No method in multimethod 'deduce-argument-type-from-symbol-on-ns' for dispatch value: null
  [Thrown class java.lang.IllegalArgumentException]

Okay that worked. But look at that ugly syntax for defining deductions now. The user has to specify the namespace manually per *ns*. In a system where we want to reduce redunance, this is a shortcoming. We have two ways to fix this:

  1. Don’t have the user use defmethod, but instead use a macro to fill in the *ns* parameter for the user.
  2. Try to deduce the namespace from the function name. This is possible because any function object is always fully qualified when converting it to a string:
    user> (str first)

    This however is a hack – mainly because it depends on a feature which may very well change in the future. Although in the case of it changing, there may very well come up another hack.

Of course I’ll go with 2. simply because it fits this blog.

First a function to get the ns of a function object:

(defn ns-of
  "Return the namespace object of a function object by looking at the
  stringified function name."
  (-> #"^([^$]+)"
      (re-find (str f))
      (first )
      (symbol )
      (find-ns )))

Now that we can figure out the namespace of a function object, we can rewrite the above definition for deduce-argument-type-from-symbol-on-ns to:

(defmulti deduce-argument-type-from-symbol-on-ns
  (fn [ns sym]
    ;; we take all of the defined methods and look at which dispatch
    ;; value (a regex) matches the symbol name.
      (fn [[regex method-fn]]
        (and (= (ns-of method-fn) ns) (re-matches regex (name sym))))
      (methods deduce-argument-type-from-symbol-on-ns)))))

(defrecord Ship [])
(defmethod deduce-argument-type-from-symbol-on-ns #"ship" [_ _]

(defmethod deduce-argument-type-from-symbol-on-ns #"image-count" [_ _]

Thats better. Though we still have the non-used arguments “[_ _]” at the end of every definition. The only way to get rid of that, is to use macros or a different deduction definition scheme altogether. This I will leave for later however.

Pluggable defn

In order for us to be able to use the above invisible to the end-user we need to make modifications to the way functions are defined. Currently whenever you define a function, be that through defn, defn- or any other variant, what is being called is the fn macro. We could define our own variants of defn & co. to insert argument deduction into user code, but even though it is common practice I don’t want to do that. Instead I’ll propose something entirely different: one defn Form into which one can plug-in different behaviors as to how the argument list & body are to be transformed.

Imagine the following:

  [ ;; adds type hints to argument metadata
    ;; locally scoped deduce-map
    :deduce-map {#"ship" Ship
                 #".*-id" Long})

   ;; adds validations for type hinted arguments

  (defn gen-projectile
    "Given a ship and a weapon id, generate a projectile and return
      [ship' projectile]. Where ship' is the modified entity."
    [ship weapon-id]
    [ship {:dmg 10 :weapon-id weapon-id :type :projectile}]))

gen-projectile would become the following after macro expansion:

(defn gen-projectile
  "Given a ship and a weapon id, generate a projectile and return
          [ship' projectile]. Where ship' is the modified entity."
  [^Ship ship ^Long weapon-id]
  (assert (instance? Ship ship))
  (assert (instance? Long weapon-id))
  [ship {:dmg 10 :weapon-id weapon-id :type :projectile}])

Notice the addition of type hints to the arguments besides the asserts.

For the above to work, we need a few more things:

  1. A macro called with-pluggable which defines a new macro locally that supports plugging in of behaviors into a base macro: in our case the defn macro.
  2. Plugins which deduce & add assertions.

Plugging stuff with with-pluggable

Lets start with 1. The with-pluggable macro takes at least two parameters. The first is a symbol naming the new macro M‘ and the second a vector containing the plugins to be used where the last value inside the vector is again a symbol to the macro which is to be made pluggable (i.e. the macro which provides the base functionality when not using any plugins). Any further arguments make up the code which is to be executed with M‘ bound within the local scope.

Before we take it apart, first the complete definition:

(use '[clojure.contrib.macro-utils :only (macrolet)])
(defmacro with-pluggable
  "Macro which defines a new macro within its body with the specified
  NAME. The second argument is a vector of N elements, where the first
  N-1 elements are argument transformer functions & the N'th element
  is a base macro which will finally be called with the transformed
  arguments. The argument transformer functions must all take the same
  arguments as the base macro. They may change the arguments in any
  way, but must return a list of the (possibly transformed) arguments
  when they're done.

          [;; a custom function which transforms the body
           (fn [name arglist body] 
                ;; Wrap the body of any function inside a:
                ;; (do (println \"...\") <original body>)
                  (println \"called pluggable with arguments:\" ~@arglist) 

           ;; the base macro is the function definition macro: defn

        ;; by using the custom defn,
        ;; my-identity will now have a transformed body.
        (defn my-identity [x] x))

  ;; Try it!
  (my-identity 1)"
  [name plugin-list & body]
  (let [args (gensym "args")]
    `(macrolet [(~name [& ~args]       ;;  
                       ;; the last element inside the plugin-list is a symbol
                       `(~'~(last plugin-list)
                         ;; any elements before that within the
                         ;; plugin-list are being reduced by applying
                         ;; them in-order on the argument list of the
                         ;; new macro. This way they can transform the
                         ;; arguments however they want.
                          ~@(reduce (fn [a# f#] (apply f# a#))
                                    ~args ~(vec (butlast plugin-list)))))]

Taking it apart

This one is a little bit tricky to understand and it lacks any kind of checking on proper argument format to keep it easy. If you want anyone to keep their sanity when using macros though, you should absolutely add checks for whether name actually is a symbol when evaluated and whether the plugin list is properly shaped. Compare the final version against this one to find out how I realized that.

So first of all, you’ll notice me using macrolet (from clojure.contrib.macro-utils) which works like letfn but on macros. Because macros do not evaluate their arguments, but instead their return value, we can do all sorts of transformations on them. In this case, the macro simply returns the following (slightly simplified) list of stuff:

(macrolet [(<NAME> [& args]
                    ~@(reduce (fn [a# f#] (apply f# a#))
                              ~args <PLUGINS>)))]

Where the placeholder <NAME> is the symbol naming your new macro, <WRAPEE> is the last element of the plugin-list vector: the base macro. And <PLUGINS> is the list of actual plugins. Note how I said list of stuff: the macro returns a list of symbols and vectors. This list of stuff is then being evaluated. Since macrolet is again a macro, it returns a list of stuff itself. Its definition is more complex however, so I’ll skip it here. What we need to know is:

  • It defines a new macro called <NAME> within a local context just like letfn does for functions.

The interesting thing is the newly created macro <NAME>. When called it expands into the following:

 ~@(reduce (fn [a# f#] (apply f# a#))
           <ARGS> <PLUGINS>))

Again, this is just a list of stuff which will be evaluated once returned from the macro. <WRAPEE> which is being called, with an argument list that is being filled in from within the <NAME> macro. Basically what happens is (reduce written out in layman’s terms):

  1. take the initial argument list <ARGS> (passed to <NAME>) and one element F from <PLUGINS>
  2. If F exists
    • goto 1. with F(<ARGS>) being the new initial argument list <ARGS>.
  3. Else return <ARGS>.

Effectively this will go over all plugins and let each of them transform the arguments to <WRAPEE> in any way. Finally the returned macros list of stuff will be evaluated. If we have for example:

  • <NAME> = defn
  • <WRAPEE> = clojure.core/defn
  • <PLUGINS> = (argument-type-deducer-plugin)

We get:

(clojure.core/defn <A>)

With <A> being the transformed arguments to your custom defn macro!

The Plugins

The deducer plugin only really does one thing: look at the argument names and for each of those check whether there is a regex matching the name. If so, it adds :type metadata to it with the correspondingly deduced type:

(defn argument-type-deducer-plugin
  "Plugin for with-pluggable. Returns a function which transforms the
  argument list by adding :type metadata to symbols which match
  regular expressions inside (optional) kw arg DEDUCE-MAP. It does so
  according to the with-pluggable specs when used on a defn form of
  structure \"(defn NAME ARGLIST BODY)\" (note: no docstring/multiple
  bodies possible as of now).

  The DEDUCE-MAP must be a map of the form {R_1 T_1, R_2 T_2, ..., R_n
  T_n}, n \\in [0,inf]. With R_i being regular expressions and T_i any
  type/class for i \\in [0,n]. "
  [& {:keys [deduce-map]}]
  (fn [name arglist body]
    (let [arglist (-> (fn [sym]
                        (if-let [deduced-type
                                  (fn [[regex t]] 
                                    (if (re-matches regex (str sym))
                           (fn [m] (merge m {:type deduced-type})))
                      (map arglist)
      [name arglist body])))

The assertion plugin is similar to the deducer plugin, in that it also goes over the list of arguments. This time however only to look whether there is :type metadata set (either by the deducer plugin or the user). If so, it adds assertions to the body for those arguments:

(defn argument-type-assertion-plugin
  "Return a function which takes three arguments (NAME, ARGLIST &
  BODY) just like a DEFN form (without docstring) and adds assertions
  for any type hinted symbol argument to the BODY argument."
  (fn [name arglist body]
    (let [assertable-args (filter #(:type (meta %)) arglist)
          (map #(do `(assert
                      (and (not (nil? ~%))
                           (instance? ~(:type (meta %)) ~%))))
      [name arglist `(do

Now we can use that:

  [ ;; adds type hints to argument metadata
    ;; locally scoped deduce-map
    :deduce-map {#"ship" Ship
                 #".*-id" Long})

   ;; adds validations for type hinted arguments

  (defn gen-projectile
    ;; Docstring omitted, as not supported (yet!)
    [ship weapon-id]
    [ship {:dmg 10 :weapon-id weapon-id :type :projectile}]))

Lets try it out:

user> (gen-projectile (Ship.) nil)
Assert failed: (clojure.core/and (clojure.core/not (clojure.core/nil? weapon-id)) (clojure.core/instance? java.lang.Long weapon-id))
[Thrown class java.lang.AssertionError]

Worked! And we have locally scoped deduce maps – adding the global & namespace scoped deduce maps to that is left as an exercise to the reader.

Still, look at the error message… did I mention assert being terrible? There you have the reason – you have to read & understand the assertion in order to make anything out of it. I for one however don’t like to be distracted with even more code when trying to debug something. But since everything is automated anyway, we can easily add in generated messages.

Proper validation messages

Since there is only really one kind of assertion error, which is the type not matching properly, we can easily hardcode that into a function:

(defn validate-arg-type-from-meta
  "Given a symbol SYM containing :type metadata and a value ARG, check
  whether ARG is an instance of (:type (meta SYM)). If not, throw an
  [sym arg]
  (if (not (and (not (nil? arg))
                (instance? (:type (meta sym)) arg)))
      (fstr "Expected argument `~A` to be of type `~S` but got type `~S` instead." sym (:type (meta sym)) (type arg))))))

Instead of using assert we can now use the above function to throw a proper error message.

Putting it together

Using the with-pluggable macro we can do more than just deduce argument types from argument names. The following example uses clojure.core.logic to define conversion routines from centimeter \Leftrightarrow meter \Leftrightarrow kilometer. This is then used together with metadata to have input arguments to a function automatically be converted into whatever the library author wants. The user of the library may use his own dimensions and the conversion will be done behind the scenes:

;; Logic routines for figuring out a path
(use '(clojure.core.logic [prelude :only (matche)]
                          [minikanren :only (conde run*)]))
(defn converto*
  "Bind RESULT to any F inside ENV forall [FROM-DIM F TO-DIM]."
  [result env from-dim to-dim]
  (matche [env]
          ([[[from-dim . ?factor . [to-dim]] . _]] 
             (== ?factor result))
          ([[_ . ?more]] (converto* result ?more from-dim to-dim))))
(defn converto
  "Bind RESULT to any F_n inside ENV forall [FROM-DIM F_0 TO-DIM] or
  alternatively forall [FROM-DIM F_1 X_1], [X_1 F_2 X_2] ... 
  [X_n F_n TO-DIM]."
  [result env from-dim to-dim]
   ((converto* result env from-dim to-dim)) 
   ((exist [intermediate-dim a b]
           (converto* a env from-dim intermediate-dim)
           (converto b env intermediate-dim to-dim)
           ;; bind a & b to result
            ((== result a))
            ((== result b)))))))

;; The graph + conversion fn
(def ^{:doc "A list of lists. Each of the lists contains 3 arguments A
     F B such that A = F*B."}
     '[[kilometer 1000 meter]
       [meter 10 dezimeter]
       [dezimeter 10 centimeter]])

(defn convert-fn
  "Return a conversion function from one dimension to
  another. Possible FROM & TO args are the outer list symbols from the
  global RULES var. The returned function will take one argument and
  multiply/divide it according to the factor as returned by the RULES using
  the specified MULTIPLY-FN/DIVIDE-FN.

   ((convert-fn 'meter 'centimeter) 10) => 1000"
  [from to & {:keys [multiply-fn divide-fn] :or
              [multiply-fn * divide-fn /]}]
  (if (= from to)
    (if-let [m (seq (run* [q] (converto q rules from to)))]
      (fn [x]
        (multiply-fn x (reduce * 1 m)))
      (if-let [m (seq (run* [q] (converto q rules to from)))]
        (fn [x]
          (divide-fn x (reduce * 1 m)))
          (str "No conversion rules defined from "
               from " to " to)))))))

;; The example data structures
(defrecord Vec2 [x y])
(defn vec-mul
  "Scalar vector multiplication."
  [v s]
  (let [f (partial * s)]
   (-> v
       (update-in [:x] f)
       (update-in [:y] f))))
(defn vec-div
  "Scalar vector division."
  [v s]
  (vec-mul v (/ 1 s)))

(defrecord Ship [position])
(defn gen-ship
  "Return a Ship instance."
  [] (Ship. {:x 0 :y 0}))

;; The conversion plugin
(defn dimension-converter-plugin
  "Plugin to insert code which converts input arguments according to
  their dimension."
  [& {:keys [multiply-fn divide-fn]}]
  (fn [name arglist body]
    (let [args-with-dimension (filter #(:dimension (meta %)) arglist)
          let-overrides (apply
                          (fn [arg-sym]
                            (do `(~arg-sym
                                  ((convert-fn (:dimension (meta ~arg-sym))
                                               ~(:dimension (meta arg-sym))
                                               :multiply-fn ~multiply-fn
                                               :divide-fn ~divide-fn)
      (println let-overrides)
      [name arglist `(let [~@let-overrides]

;; Everything put together
  [(dimension-converter-plugin :multiply-fn vec-mul :divide-fn vec-div)

  (defn ship-move-by [ship ^{:dimension 'meter} offset]
    (-> ship
        (update-in [:position :x] (partial + (:x offset)))
        (update-in [:position :y] (partial + (:y offset))))))

(defn as-dim [dimension obj]
  (vary-meta obj (fn [m] (merge m {:dimension dimension}))))

;; Finally: the test
(ship-move-by (gen-ship) (as-dim 'kilometer (Vec2. 1 1)))

The result is:

user.Ship{:position {:y 1000, :x 1000}}

Noticed the automatic kilometer to meter conversion? This enables the developer of a function to use whatever dimension for his values that he wants. And the user can do the same for his code, so long as there is a conversion from user dimensions to library dimensions.

The library

The above is how sanity has been born. There’s still much to do, and it’ll probably change a lot in the next few weeks though.


1 I am using Long and Double here because of clojure 1.3. In clojure 1.2 you would be using Integer and Float for this and all following examples.