The Speedups Keep on Coming

A while back I wrote about some updates I had made to the CDK fingerprinting code to improve performance. Recently Egon and Jonathan Alvarsson (Uppsala) had made even more improvements. Some of them are simple fixes (making a String[] final, using Set rather than List) while others are more significant (efficient caching of paths). In combination, they have improved performance by over 50%, compared to my last update. Egon has put up a nice summary of performance runs here. Excellent work guys!

7 thoughts on “The Speedups Keep on Coming

  1. Jonathan did all this work… I only set up a GDocs spreadsheet to show him how to share his electronic lab notebook :)

    One big remaining bottleneck is deep inside the CDK classes: getConnectedAtomsList()… this is because it has to iterate over all bonds to find which atom is connected. That scale as O(n)… Instead, while a more memory consuming, I’d really like to see atoms have pointers to attached bonds… This adds a few pointers to the atom, 4 byte each(?), but converts the getConnectedAtomsList() scalability to O(4).

  2. How do you attach pointers? What about restructuring and adding a hash table to each atom, listing connected atoms?

  3. Adding a private, non-getsetable Map<IAtom,List> to AtomContainer would do the job… then we would not even need to change the interfaces…

  4. But, we need first a good benchmark set…. a set of tasks any cheminformatics toolkit could do, by which we could compare performance…

  5. Good idea – benchmark should be easy. Find high level classes that use getConnectedAtoms and benchmark them. Likely, many descriptors will use this function.

  6. Rich Apodaca says:

    Egon and Rajarshi, good points about iterating over all bonds to find a connection. In MX, every atom maintains a List of its neighbors. For some reason, MX still iterates over all bonds to find a connection between two atoms with Molecule.getBond(Atom, Atom):

    Easy enough to fix, though.

    BTW, Rajarshi, does your comments system allow markup such as ?

Leave a Reply

Your email address will not be published. Required fields are marked *