A Comment on Fingerprint Performance

In a comment to my previous post on bit collisions in hashed fingerprints, Asad reported on some interesting points which would be useful to have up here: Very interesting topic. I have faced these challenges while working with fingerprints and here are few observations from my end. By the way I agree that mathematically the […]

Path Fingerprints and Hash Quality

Recently, on an email thread I was involved in, Egon mentioned that the CDK hashed fingerprints were probably being penalized by the poor hashing provided by Java’s hashCode method. Essentially, he suspected that the collision rate was high and so that the many bits were being set multiple times by different paths and that a fraction of bits were not […]

Faster Fingerprinting

In my last post I had reported some timing measurements for various operations. One of them was fingerprinting using the path-based hashing Fingerprinter class in the CDK. As reported, it took nearly 4 minutes to process a 1000-molecule subset of ZINC. Not good. So I spent a little time last night hacking on the code, […]

So much to do, so little time

Trying to squeeze sense out of chemical data

A Comment on Fingerprint Performance

Path Fingerprints and Hash Quality

Faster Fingerprinting