The Chemical Descriptors Library (CDL) has been around for a while, but hasn’t seemed to get much publicity. A paper describing the design and performance of the library just came out today. While the name suggests a library of descriptors, it’s actually a general C++ library for cheminformatics. The library appears to use the molecular […]
Faster Substructure Search in the CDK
The CDK uses the UniversalIsomorphismTester to perform graph and subgraph isomorphism. However it’s not very efficient and this shows when performing substructure searches over large collections. A quick test where I compared the CDK code to OpenBabel’s obgrep showed that the CDK is nearly forty times slower than OpenBabel. Improvements in this code will enhance […]
Computational Research and Software in Academia
This paper by Prof. Tim Pederson in the Journal of Computational Linguistics highlights the need for authors of computational linguistics papers to release working software that can be used to reproduce results in their papers. While the paper focuses on the field of computational linguistics (CL), the discussion is perfectly applicable to other fields that […]
Faster Fingerprinting
In my last post I had reported some timing measurements for various operations. One of them was fingerprinting using the path-based hashing Fingerprinter class in the CDK. As reported, it took nearly 4 minutes to process a 1000-molecule subset of ZINC. Not good. So I spent a little time last night hacking on the code, […]
CDK Performance Measurements
As part of a larger project, I’ve been doing some profiling on various aspects of the CDK, focusing on core cheminformatics operations. I’m using the excellent YourKit profiler to do the tests. They tests are run on a Macbook Pro (2.16GHz) with 1GB RAM, using the latest trunk version of the CDK and JDK 1.5. […]