As I noted in my previous post, one of the nice features of Clojure is its support for concurrent programming. Now, it provides some fancy features that allow one to write complex parallel programs. I’m certainly no expert on that topic. However, one thing that I do everyday is perform operations on elements of a list. Traditionally, this is a serial operation. But what’d be nice is to have my compiler (or environment) perform this operation in parallel over the elements of the list. Clojure provides a very simple way to do this – pmap.
The map form simply applies a function to the elements of a list (or corresponding elements of multiple lists) in order, returning a list. By prepending the “p”, this is done in parallel making use of as many cores as are present on your system (but see below). Given the ease of this operation, lets see what we can do with pmap and the CDK.
Based on the previous post, lets calculate fingerprints in a serial fashion followed by the parallel version and see how long it takes. For completeness, I’ll repeat the code from the previous post. First import our packages and set up some basic objects and functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | (import '(org.openscience.cdk.smiles SmilesParser)) (import '(org.openscience.cdk DefaultChemObjectBuilder)) (import '(org.openscience.cdk.fingerprint MACCSFingerprinter)) (import '(org.openscience.cdk.fingerprint Fingerprinter)) (import '(org.openscience.cdk.fingerprint ExtendedFingerprinter)) (import '(org.openscience.cdk.smiles.smarts SMARTSQueryTool)) ;; so we can read lines from a file (use 'clojure.contrib.duck-streams) (def sp (new SmilesParser (. DefaultChemObjectBuilder (getInstance)))) (def fprinter (new ExtendedFingerprinter)) (defn getmol [smiles] (. sp (parseSmiles smiles))) (defn getfp [mol] (. fprinter (getFingerprint mol))) |
Next, we load the 4,688 molecules from the data file I described previously. In contrast to before, this code is slightly shorter (thanks to Nik) but also uses the doall form. This forces evaluation of the list, so that the molecules are all loaded into memory. In the timing code we again use it, since if we don’t, we just get the time for the list creation step (which is “instantaneous” due to lazy evaluation), rather than the actual list evaluation.
1 2 | (def mols (doall (map #(getmol (. % trim)) (read-lines "junk.smi")))) |
Now, we can evaluate the fingerprints and time the operation. Initially I performed these calculations on my Macbook Pro with 2GB RAM and a dual core CPU.
1 | (time (def fpserial (doall (map getfp mols)))) |
This run took 38.8 s (averaged over three runs). Next we consider the parallel version
1 | (time (def fpparallel (doall (pmap getfp mols)))) |
This version has an average run time of 23.4 s – a 1.6x speedup. Now, it’s not exactly a two-fold speedup. Part of the reason is that there is some overhead for the threads. Also, even in the serial version, the garbage collector takes up some of the second core and in the parallel version, this will contend with the actual calculation.
Just to be sure that the calculation works OK, lets compare (via BitSet.equals) the fingerprints obtained using the two versions. We expect the result of the code below to be 0
1 2 3 4 5 | (count (filter #(if (not %) %) (map (fn [x,y] (. x (equals y))) fpserial fpparallel))) |
and that’s exactly what we get.
What about using more cores? I have access to some dual-CPU machines with 8GB of RAM, each CPU having four cores. Repeating the above calculations, the serial version takes 28.1 s and the parallel version takes 8.6 s, a 3.2x speedup. One thing I noted was that this really only uses the cores on one CPU, rather than all eight cores.
One thing that will require more investigation is to what extent we can make use of the CDK in parallel environments, since the library was not designed with thread-safety in mind. For example, parsing the SMILES strings using pmap (after reading in all the lines from the file) gives me an ExecutionException error.
In any case, it’s very cool that I can use multiple cores just converting map to pmap.
Thanks for the tip. We decided to give parallelization a try with CDK. We will see how it goes and will report our findings.
[…] Full Story […]
Were you running Clojure with “java -server”? I read somewhere that the server VM may perform better in some cases.
Yes, I’m using the server flag (though I think on OS X it’s there by default)
I think 4GL language constructs is the best approach to parallelism. So less plumbing.
It also means we can hav a language that does most of the things parallely by default, but we are a long way from that now.
[…] while back I had started playing with Clojure. It’s always been a spare-time hobby and not having had much spare time I […]