So much to do, so little time

Trying to squeeze sense out of chemical data

The Speedups Keep on Coming

with 7 comments

A while back I wrote about some updates I had made to the CDK fingerprinting code to improve performance. Recently Egon and Jonathan Alvarsson (Uppsala) had made even more improvements. Some of them are simple fixes (making a String[] final, using Set rather than List) while others are more significant (efficient caching of paths). In combination, they have improved performance by over 50%, compared to my last update. Egon has put up a nice summary of performance runs here. Excellent work guys!

Written by Rajarshi Guha

December 4th, 2008 at 5:41 pm

7 Responses to 'The Speedups Keep on Coming'

Subscribe to comments with RSS or TrackBack to 'The Speedups Keep on Coming'.

  1. Jonathan did all this work… I only set up a GDocs spreadsheet to show him how to share his electronic lab notebook :)

    One big remaining bottleneck is deep inside the CDK classes: getConnectedAtomsList()… this is because it has to iterate over all bonds to find which atom is connected. That scale as O(n)… Instead, while a more memory consuming, I’d really like to see atoms have pointers to attached bonds… This adds a few pointers to the atom, 4 byte each(?), but converts the getConnectedAtomsList() scalability to O(4).

  2. How do you attach pointers? What about restructuring and adding a hash table to each atom, listing connected atoms?

    Rajarshi Guha

    4 Dec 08 at 7:52 pm

  3. Adding a private, non-getsetable Map<IAtom,List> to AtomContainer would do the job… then we would not even need to change the interfaces…

  4. But, we need first a good benchmark set…. a set of tasks any cheminformatics toolkit could do, by which we could compare performance…

  5. Good idea – benchmark should be easy. Find high level classes that use getConnectedAtoms and benchmark them. Likely, many descriptors will use this function.

    Rajarshi Guha

    5 Dec 08 at 1:23 pm

  6. Egon and Rajarshi, good points about iterating over all bonds to find a connection. In MX, every atom maintains a List of its neighbors. For some reason, MX still iterates over all bonds to find a connection between two atoms with Molecule.getBond(Atom, Atom):

    Easy enough to fix, though.

    BTW, Rajarshi, does your comments system allow markup such as ?

    Rich Apodaca

    13 Dec 08 at 7:58 pm

Leave a Reply