Daniel Compton

The personal blog of Daniel Compton - Projects

Speeding up Clojure(Script) sorting by 100x

I’ve recently been working on a new ClojureScript application as part of a contract, and I was digging around for things to polish before launch. The app was mostly fast, but I noticed that when the main list of content got to around 40 items, it was a little bit slow to render. I also noticed that it seemed like it got almost twice as slow when I added another 10 items. At this point, you might already be having alarm bells go off in your head suggesting what the problem was likely to be. I didn’t, so I dived into the code to look at the part of the app rendering the main list.

I looked over the code path that rendered the list, and wrapped time around a few suspect pieces of code. After a few checks, I found that a sort-by function in the view was the slow part, though it wasn’t immediately clear why sorting a list of 40 items would take a second. We were using sort-by to order items by state, then reverse date order (newest items first).

sort-by takes a custom sort key function. Our key function was doing some date parsing to parse a string into a date, then subtracting the current unix time from the parsed unix time to give an integer value. The lowest numbers are the most recent dates. I suspected that the date parsing could be the problem, but I wasn’t really sure. As an experiment, I disabled all of the date parsing, and returned the string directly. My sorting was the wrong way around, but it went from taking 1000 ms to 10 ms, a factor of 100x speedup!

A standard sort of the dates (which were in ISO 8601 order, e.g. 2016-04-02T08:24:31+00:00) sorted the oldest dates as the first ones in the list. After a few minutes thinking, I remembered I had recently read the clojure.org guide on comparators. In it, it discusses a reverse comparator:

(fn [a b] (compare b a))

This comparator is exactly the same as a normal comparator, but it will return the opposite result to what the normal one would. Passing this comparator to sort-by kept the 100x speedup, but sorted in the correct order. The list rendered in 10-15 ms, and was now plenty fast enough.

One question remained though, why was the list getting so much slower to render as I added a few more items to it? Of course reading this now, you Dear Reader are probably thinking to yourself, “aha! ClojureScript’s sorting algorithm will be O(n log(n)), and the slow key function will therefore be called O(n log(n)) times.” It took me a bit more thinking than you (I didn’t have a blog post explaining why my code was slow to work from), but I got to this conclusion in the end too.

I really enjoyed this debugging session, as it is not very often that I can both speed up real world code by 100x, and get exposed directly to algorithmic complexity issues. A++, would debug again.

git check-ignore: Explaining why git has ignored a file

Most of the time I don’t have any issues with git’s .gitignore system but occasionally it doesn’t behave as I expect. Usually, I add and remove lines until I get the result I’m after and leave feeling vaguely unsatisfied with myself. I always had a nagging feeling that there must be a smarter way to do this, but I was never able to find it. Until today!

Enter git check-ignore. You pass it files on the command line, and it tells you if they’re ignored or not, and why. Sounds pretty perfect right? I won’t give you a run down of all the options, as the man page is surprisingly readable (especially for a git man page!). However the 30-second version is git check-ignore --verbose --non-matching <filepath to check>:

$ echo "b" > .gitignore
$ # a is a file that would not be ignored
$ git check-ignore --verbose --non-matching a
::    a
$ # b is a file that would be ignored (matched)
$ git check-ignore --verbose --non-matching b
.gitignore:1:b    b

--verbose prints the explanation of why a file was ignored. --non-matching prints the file name out, even if Git would not ignore it. If a file is going to be ignored, check-ignore prints out the path to the gitignore file and line that matched. Interestingly, the files don’t even need to exist, git just checks what it would do if they did exist.

Using check-ignore I was able to solve my problem faster, and learn a bit more about git’s arcane exclusion rules. Another handy tool to add to my unix utility belt.