In a comment to my previous post on bit collisions in hashed fingerprints, Asad reported on some interesting points which would be useful to have up here: Very interesting topic. I have faced these challenges while working with fingerprints and here are few observations from my end. By the way I agree that mathematically the […]
Hashed Fingerprints and RNG’s
In my previous post I looked at how many collisions in bit positions were observed when generating hashed fingerprints (using the CDK 1024-bit hashed fingerprint and the Java hashCode method). I summarized the results in the form of “bit collision plots” where I plotted the number of times a bit was set to 1 versus […]
Path Fingerprints and Hash Quality
Recently, on an email thread I was involved in, Egon mentioned that the CDK hashed fingerprints were probably being penalized by the poor hashing provided by Java’s hashCode method. Essentially, he suspected that the collision rate was high and so that the many bits were being set multiple times by different paths and that a fraction of bits were not […]
The CDK is 10 Years Old
As Egon has pointed out, the CDK project started 10 years ago today tomorrow – congratulations to everybody involved in the project. But also, Egon deserves a huge vote of thanks for keeping the project going – not only in terms of code contributions but also the “grunt” work such as releases, bug fixes, documentation and […]
New Versions of rcdk and rcdklibs
I’ve put released an update to rcdk and rcdklibs on CRAN – right now source packages are available, but binary ones should show up soon. Both packages should be updated together. These packages integrate the CDK into the R environment and simplifies a number of cheminformatics tasks. These versions used CDK 1.3.6 and JCP 16, […]