personal website and blog of

Johannes Staffans

Fun with Specter

19.11.2015

Specter is a new library by Nathan Marz that makes it easier to deal with nested data structures, such as lists of maps of maps. This post explores some less-documented parts of the library.

For this post, we'll be dealing with a simple movie dataset:

[{:name "First Blood", :director "Ted Kotcheff", :rating 7.6}
 {:name "Lethal Weapon 3", :director "Richard Donner", :rating 6.6}
 {:name "Predator", :director "John McTiernan", :rating 7.8}
 {:name "Mad Max Beyond Thunderdome", :director "George Miller", :rating 6.1}
 {:name "The Terminator", :director "James Cameron", :rating 8.1}
 ... ]

Transforming the sequence is simple enough:

(require '[com.rpl.specter :as s])
=> nil
(s/transform [s/ALL :name] clojure.string/upper-case movies)
=> [{:name "FIRST BLOOD", :director "Ted Kotcheff", :rating 7.6}, ... ]

In general, transformation is Specter's forte and is covered very well by the documentation. I was however interested in exploiting Specter for doing analysis and aggregation of data stored in a sequence of nested maps. It is possible to do so with standard Clojure functions, but I like the declarativeness of Specter and wanted to give it a shot.

Let's for example find all movies by James Cameron with a rating higher than 8.0:

(s/select 
  [s/ALL 
   (s/cond-path 
     [:director #(= "James Cameron" %)] 
     [:rating #(> % 8.0)])] 
  movies)
=> [8.1 8.5 8.6]

(Note: Nathan Marz let me know via Twitter that there's a better to do the following, more on that at the end of the post!)

So we get the ratings and they are all greater than 8.0, but we have lost the original maps. How do we get those? It turns out that you can reference the VAL symbol basically anywhere within the selector path. This resolves to whatever value is selected by Specter at this level of nesting. If we example put VAL at the end, we duplicate the rating:

(s/select 
  [s/ALL 
   (s/cond-path 
     [:director #(= "James Cameron" %)] 
     [:rating #(> % 8.0) s/VAL])] 
  movies)
=> [[8.1 8.1] [8.5 8.5] [8.6 8.6]]

When working with sequences of maps, it's usually the case that we want to get the whole map back, so we should put VAL at the beginning of the selector path:

(s/select 
  [s/ALL 
   s/VAL
   (s/cond-path 
     [:director #(= "James Cameron" %)] 
     [:rating #(> % 8.0)])] 
  movies)
=> [[{:name "The Terminator", :director "James Cameron", :rating 8.1} 8.1] ... ]

Now we get the full map back, but it's wrapped in a collection. We can introduce a helper function for this use case:

(defn select-maps 
  [selector structure]
    (->> (s/select selector structure)
         ; the map we're after is always the first argument
         (mapv (fn [[m & _]] m))))    

Now working with sequences of maps is more comfortable:

(select-maps
  [s/ALL 
   s/VAL
   (s/cond-path 
     [:director #(= "James Cameron" %)] 
     [:rating #(> % 8.0)])] 
  movies)
=> [{:name "The Terminator", :director "James Cameron", :rating 8.1} ... ]

Specter's path definition functions can be exploited to get e.g. movies of James Cameron that have an either very bad or very good rating:

(select-maps
  [s/ALL 
   s/VAL
   (s/cond-path [:director #(= "James Cameron" %)]
     (s/multi-path [:rating #(> % 8.5)] [:rating #(< % 6.0)]))] 
  movies)
=>
[{:name "Terminator 3: Judgment Day", 
  :director "James Cameron", :rating 8.6}
 {:name "Piranhas II", :director "James Cameron", :rating 3.5}]

So this works, but it's clunky and requires a helper function. Is there an easier way?

The right way

As pointed out to me by the library author, there's a much better way of accomplishing the above:

(s/select 
  [s/ALL
   (s/selected? :director #(= "James Cameron" %))
   (s/selected? :rating #(> % 8.0))]
  movies)

The selected? function filters the current value based on whether the selector that follows matches anything. It doesn't mess up the return value like the other examples above. It can of course be combined with conditional paths or multipaths as in the previous example:

(s/select 
  [s/ALL
   (s/selected? :director #(= "James Cameron" %))
   (s/selected? (s/multi-path [:rating #(> % 8.0)] [:rating #(< % 6.0)]))]
  movies)

Conclusion

I think Specter is one of the best things to come out of the Clojure ecosystem recently. I find it a lot easier to grasp how to use Specter than for example zippers, which is another popular way of working with nested data structures. It's still a bit dense to get into, though — Specter would benefit greatly from something like the Learn Datalog Today website!

Back