So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘path’ tag

Faster Fingerprinting

with 3 comments

In my last post I had reported some timing measurements for various operations. One of them was fingerprinting using the path-based hashing Fingerprinter class in the CDK. As reported, it took nearly 4 minutes to process a 1000-molecule subset of ZINC. Not good.

So I spent a little time last night hacking on the code, primarily making the search for unique paths a little faster. Happily, my latest commit (in 1.2.x, should be merged into trunk soon) allows the fingerprinter to process 1000 molecules in approximately 59s – a 4X speed up.

In terms of behavior, the new code gets the exact same paths as the old code, the only difference being that the order of atoms in the path can be reversed. Since the fingerprint is generated by hashing “path strings”, this means that the fingerprints from the new code will differ slightly from the old code. So if you’re working witha bunch of fingerprints calculated with the old code, you should probably regenarate them with the new code.

Written by Rajarshi Guha

September 12th, 2008 at 12:29 pm

Posted in software

Tagged with , , , , , ,