BAZOO

So much to do, so little time

Trying to squeeze sense out of chemical data

fingerprint 3.5.2 released

with 2 comments

Comparison of nested loop performance in R and C for Tanimoto similarity matrix calculation.

Comparison of nested loop performance in R and C for Tanimoto similarity matrix calculation.

Version 3.5.2 of the fingerprint package has been pushed to CRAN. This update includes a contribution from Abhik Seal that significantly speeds up similarity matrix calculations using the Tanimoto metric.

His patch led to a 10-fold improvement in running time. However his code involved the use of nested for loops in R. This is a well known bottleneck and most idiomatic R code replaces for loops with a member of the sapply/lapply/tapply family. In this case however, it was easier to write a small piece of C code to perform the loops, resulting in a 4- to 6-fold improvement over Abhiks observed running times (see figure summarizing Tanimoto similarity matrix calculation for 1024 bit fingerprints, with 256 bits randomly selected to be 1). As always, the latest code is available on Github.

Written by Rajarshi Guha

October 27th, 2013 at 10:44 pm

Posted in cheminformatics,software

Tagged with , ,

2 Responses to 'fingerprint 3.5.2 released'

Subscribe to comments with RSS or TrackBack to 'fingerprint 3.5.2 released'.

  1. Hi Rajarshi,

    If it’s a similarity matrix hen it’s symmetric right? In the github code it looks as though every entry is being computed when you only need to half of them.

    J

    John

    28 Oct 13 at 10:00 am

  2. You’re correct. Fixed in github

    Rajarshi Guha

    28 Oct 13 at 4:05 pm

Leave a Reply