Archive for the ‘performance’ tag
In my last post I had reported some timing measurements for various operations. One of them was fingerprinting using the path-based hashing Fingerprinter class in the CDK. As reported, it took nearly 4 minutes to process a 1000-molecule subset of ZINC. Not good.
So I spent a little time last night hacking on the code, primarily making the search for unique paths a little faster. Happily, my latest commit (in 1.2.x, should be merged into trunk soon) allows the fingerprinter to process 1000 molecules in approximately 59s – a 4X speed up.
In terms of behavior, the new code gets the exact same paths as the old code, the only difference being that the order of atoms in the path can be reversed. Since the fingerprint is generated by hashing “path strings”, this means that the fingerprints from the new code will differ slightly from the old code. So if you’re working witha bunch of fingerprints calculated with the old code, you should probably regenarate them with the new code.
As part of a larger project, I’ve been doing some profiling on various aspects of the CDK, focusing on core cheminformatics operations. I’m using the excellent YourKit profiler to do the tests. They tests are run on a Macbook Pro (2.16GHz) with 1GB RAM, using the latest trunk version of the CDK and JDK 1.5.
The test data is a 1000-molecule subset take from the ZINC collection. The operations I’ve been looking at are
- Ring perception (AllRingsFinder)
- Aromaticity Perception (CDKHueckelAromaticityDetector)
- Atom type perception
- SDF reading (IteratingMDLReader)
- Tanimoto similarity (Tanimoto.calculate())
The test harness simply reads the 1000 molecules one by one and performs the operation in question. For certain tasks which are not atomic in nature, the code does a little more but the timing is measured only for the operation under study. In all cases, things like loading molecules from disk are not measured. The whole process is repeated 10 times and the times reported are the average of the 10 runs. A brief overview of the results: