The Speedups Keep on Coming

A while back I wrote about some updates I had made to the CDK fingerprinting code to improve performance. Recently Egon and Jonathan Alvarsson (Uppsala) had made even more improvements. Some of them are simple fixes (making a String[] final, using Set rather than List) while others are more significant (efficient caching of paths). In combination, they have improved performance by over 50%, compared to my last update. Egon has put up a nice summary of performance runs here. Excellent work guys!

7 thoughts on “The Speedups Keep on Coming”

Egon Willighagen says:

December 4, 2008 at 7:45 pm

Jonathan did all this work… I only set up a GDocs spreadsheet to show him how to share his electronic lab notebook

One big remaining bottleneck is deep inside the CDK classes: getConnectedAtomsList()… this is because it has to iterate over all bonds to find which atom is connected. That scale as O(n)… Instead, while a more memory consuming, I’d really like to see atoms have pointers to attached bonds… This adds a few pointers to the atom, 4 byte each(?), but converts the getConnectedAtomsList() scalability to O(4).

Rajarshi Guha says:

December 4, 2008 at 7:52 pm

How do you attach pointers? What about restructuring and adding a hash table to each atom, listing connected atoms?

Egon Willighagen says:

December 5, 2008 at 7:48 am

Adding a private, non-getsetable Map<IAtom,List> to AtomContainer would do the job… then we would not even need to change the interfaces…

Egon Willighagen says:

December 5, 2008 at 7:55 am

But, we need first a good benchmark set…. a set of tasks any cheminformatics toolkit could do, by which we could compare performance…

Rajarshi Guha says:

December 5, 2008 at 1:23 pm

Good idea – benchmark should be easy. Find high level classes that use getConnectedAtoms and benchmark them. Likely, many descriptors will use this function.

Rich Apodaca says:

December 13, 2008 at 7:58 pm

Egon and Rajarshi, good points about iterating over all bonds to find a connection. In MX, every atom maintains a List of its neighbors. For some reason, MX still iterates over all bonds to find a connection between two atoms with Molecule.getBond(Atom, Atom):

http://tinyurl.com/63t64v

Easy enough to fix, though.

BTW, Rajarshi, does your comments system allow markup such as ?

So much to do, so little time

Trying to squeeze sense out of chemical data

7 thoughts on “The Speedups Keep on Coming”

Leave a Reply Cancel reply