fingerprint 3.5.2 released

Comparison of nested loop performance in R and C for Tanimoto similarity matrix calculation.

Comparison of nested loop performance in R and C for Tanimoto similarity matrix calculation.

Version 3.5.2 of the fingerprint package has been pushed to CRAN. This update includes a contribution from Abhik Seal that significantly speeds up similarity matrix calculations using the Tanimoto metric.

His patch led to a 10-fold improvement in running time. However his code involved the use of nested for loops in R. This is a well known bottleneck and most idiomatic R code replaces for loops with a member of the sapply/lapply/tapply family. In this case however, it was easier to write a small piece of C code to perform the loops, resulting in a 4- to 6-fold improvement over Abhiks observed running times (see figure summarizing Tanimoto similarity matrix calculation for 1024 bit fingerprints, with 256 bits randomly selected to be 1). As always, the latest code is available on Github.

2 thoughts on “fingerprint 3.5.2 released

  1. John says:

    Hi Rajarshi,

    If it’s a similarity matrix hen it’s symmetric right? In the github code it looks as though every entry is being computed when you only need to half of them.

    J

Leave a Reply

Your email address will not be published. Required fields are marked *