Notes on Clojure

I've been playing with Clojure a lot lately in my freetime. It's a functional language that runs on the JVM that prioritizes simplicity and values (immutable data) as core design principals and is currently one of the more popular lisp dialects.

When i first started writing Clojure i didn't think i would like it as much as i eventually did. On the surface there was a lot that just didn't jive well with me. It was a dynamically typed lisp running on the JVM. At least in my head there appeared to be a severe impedance mismatch between the runtime and the language. Also, the thought of writing reliable code without a type checker brought up recollections of nightmarishly debugging javascript soup in production code bases.

The coolest part about my experience with clojure though was how fast i was able to get over these biases of mine. I think one of the biggest reasons behind this for me was how simple it was to tap into some of the more powerful features of Clojure without severe hoop jumping or yak shaving. This was a huge spectral shift for me as i had been doing lots of Scala previously with my free time.

The amazing thing about Clojure is that somehow while maintaining to be a relatively simple language, it ships with some pretty powerful features out of the box that are very accessible to new developers. These features are what i want to focus on for the rest of this post as they along with the simplicity of the language have been what has largerly converted me.

Persistent Data Structures

A common argument against functional programming languages is that they're inherently inneffecient. The common addage against lisp programmers specifically is as follows.

Lisp programmers know the value of everything and the cost of nothing - Alan Perlis

For the most part this inherent view of ineffeciency has to do with traditionaly lisp dialects only able to represent all data structures via linked lists (ouch).

One of the most innovative bits about clojure is that unlike most lisp dialects, you can represent data directly in code as maps, sets and vectors as well as in the more traditional cons cell style linked lists.

The coolest part about the way Clojure represents these immutable maps, sets and vectors internally is that they are all stored as persistent data structure called a 32-way Hash Array Mapped Tries (HAMTs). Phil Bagwell originally described this effecient immutable datastructure in his paper Ideal Hash Trees and inspired by this paper, Rich Hickey, the author of clojure, took this research and implemented all the native clojure data structures as HAMTs.

The TL;DR benefits behind having immutable maps, sets and vectors implemented as HAMTs are as follows:

  • You get effecient immutable copies of data structures via effecient path copying (Think of it as a diff)
    • This results in significantly less memory consumed by algorithims that do transforms over immutable data.
    • This also results in less CPU time that would be otherwise spent creating defensive immutable copies on transformations.
  • You get effecient data access. Lookups are O(Log32 N) which is damn near constant time.
    • HAMTs being 32 way means that they are very shallow trees. You would need ~35 billion elements to get the tree to depth 8.

For a more detailed read on how clojure implements it's datastructures check out this great blog post by Jean Niklas L'orange.

For additional reading on persistent data structures in general check out Crish Okasaki's Purely Functional Data Structures

Records

One of my original gripes against clojure was that i didn't want to program everything in terms of maps, sets, vectors and lists. I spent a chunk of my career doing this previously with PHP and Javascript and I definately did not miss it.

Thankfully Clojure allows you to define native objects in terms of what they call records. They are analogous to one of my favorite features from Scala, case classes!

Heres a side by side example of how you might represent a user object in Scala via case classes and then again in Clojure via records.

Scala:

case class Email(str: String)

case class Password(str: String)

case class User(email: Email, password: Password)  

Clojure:

(defrecord [email password])

But this kind of sucks as we lost all that super useful type information :( . Fear not though, we can get them back using a very powerful library from the kind folks at Prismatic called Schema.

(ns blog.example
    (:import [schema.core :as s]))

(def Email
  (s/pred #(re-matches #".*@.*" %)))

(def Password
  (s/pred #(> (count %) 7)))

(s/defecord User
  [email    :- Email
   password :- Password])

Not only does this example with the Schema library gain back the type information we lost, it also adds validation for the data at the same time. This is a mega win for the applications i tend to write and adding the equivelant validation to the case classes in Scala becomes messy really fast.

Protocols

Clojure protocols should be nothing new to you if you've ever seriously took a stab at writing code in Haskell, Rust or Scala (they're even supported in C++ via SFINAE and templates). Haskell calls them typeclasses, Rust calls them traits and Scala calls them implicit views and of course Clojure calls them protocols (oh boy, naming must be really hard >_< )

If you haven't programmed in any of the above languages and used their equivelant of Clojure protocols before it can be a bit awkward to understand the significance of protocols or just even understand them at all.

The thing that really helped me understand protocols/typeclasses was Philip Wadler's Expression Problem

The expression problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts). - Philip Wadler

For the OOP brained programmers out there, when he says datatype think interface and when he says cases think classes.

So in OO terms you can reword the problem to be something along the lines of:

Given: An unmodifiable interface.

  1. Create new classes that implement that interface without modifying the interfaces source and without casting.
  2. Add a new methods to the interface without modifying it's source and without casting.

People more familiar with OOP languages will look probably at this problem and be like "herp derp, solving #1 is ezpz but srslsy, wtf #2!?". Likewise the opposite may be more intuitive with people more inclined to functional languages.

Lets give some extreme examples here and demonstrate how you solve #1 in an OO language (Java) and #2 in an FP language (Haskell).

Solving #1 in Java:

// Unmodifiable interface (datatype)
interface Ridable {  
  void ride();
}

// Cases/Classes implementing the interface without
// modifying the interface (datatype) and without casting.
class Car implements Ridable {  
  void ride() { /* ... */}
}

class Bicycle implements Ridable {  
  void ride() { /* ... */}
}

Solving #2 in Haskell

-- Unmoddifiable datatype
data Transport = Car | Bicycle

-- New functions added over the datatype without
-- modifying the datatype and without casting.
instance Ride Transport where  
  ride Car = Driving Car
  ride Bicycle = Peddling Bicycle

instance Park Transport where  
  park Car = ParkingBrake Car
  park Bicycle = KickStand Bicycle

What we can see from these two examples in Java and Haskell is that each of the languages is optimized to solve opposing sides of the expression problem.

Java and OO languages in the large are optimized for adding cases/classes to interfaces. While Haskell and Rust are more optimized for adding functions over datatypes.

The key differentiator between the two cases is the shape of where the functionality gets defined. In the OO case, functionality typically gets defined within datatypes and cases where as in the FP case functionality typically gets defined outside of the data type.

By now you might be saying "WTF, i thought this was a blog post about clojure!?" Well, the interesting thing about Clojure (like Scala), is that it runs on the JVM and interops with Java. This makes Clojure one of the few languages that have an answer to both sides of the expression problem out of the box.

What follows is an example of clojure using protocol to implement two interfaces from outside of the definition of the User datatype.

(defprotocol Persistable
  (persist [entity]))

(defprotocol Deletable
  (delete [entity]))

(s/defecord User
  [email    :- Email
   password :- Password])

(extend User
  Persistable
  {:persist (s/fn :- User [user :- User]
    ;; persist user here
    )})

(extend User
  Deletable
  {:delete (s/fn [user :- User]
    ;; delete user here
    )})

This is pretty awesome stuff and in my musings with Clojure thus far i've not had the need to extend a Java class from clojure so while it's doable via the proxy function, i'm going to omit it from this post because you should probably avoid that anyways.