So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘optimization’ tag

New version of fingerprint (3.4.9) – faster Dice similarity matrices

without comments

I’ve just pushed a new version of the fingerprint package that contains an update provided by Abhik Seal that significantly speeds up calculation of pairwise similarity matrices when using the Dice similarity method. A ran a simple comparison using different numbers of random fingerprints (1024 bits, with 512 bits set to one, randomly) and measured the time to evaluate the pairwise similarity matrix. As you can see from the figure alongside, the new code is significantly faster (with speed ups of 450x to 500x). The code to generate the timings is below – it probably should wrapped in a loop to multiple times for each set size.

1
2
3
4
5
fpls <- lapply(seq(10,300,by=10),
               function(i) sapply(1:i,
                                  function(x) random.fingerprint(1024, 512)))
times <- sapply(fpls,
                function(fpl) system.time(fp.sim.matrix(fpl, method='dice'))[3])

Written by Rajarshi Guha

October 30th, 2012 at 11:10 pm