BAZOO

So much to do, so little time

Trying to squeeze sense out of chemical data

Performance of AtomContainerSet versus ArrayList

with 6 comments

In my previous post, I had asked whether we really need AtomContainerSet and other related specialized container classes as opposed to using parametrized List objects. Gilleain had mentioned some issues that might require these specialized classes. But at this point it’s not clear to me what the intended goal for these classes was.

For now, assuming that they are used purely as storage classes, I decided to do some performance testing, comparing AtomContainerSet to ArrayList<IAtomContainer>. Currently these results are based on System.currentTimeMillis() as my YourKit licence has expired. So the results are not as fine grained as I’d like.

To perform the tests I read in 2000 3D structures taken from Pub3D and loaded them into an IAtomContainer[]. I then considered four operations

  • Given an empty AtomContainerSet, we simply loop over the 2000-element array and add each molecule to the empty container using addAtomContainer. We do the same thing with an empty ArrayList<IAtomContainer>, but use add. The operation is repeated 100 times and the mean time for the addition of 2000 molecules to the empty containers is taken.
  • Randomly access elements of the AtomContainerSet and ArrayList, 50,000 times
  • Access each element serially (using for (IAtomContainer x : container) idiom). Averaged 100 times
  • Remove 500 molecules by reference, randomly from each container structure
  • Remove 500 molecules by index, randomly from each container structure

The summary view of the results is shown below. For each pair of bars, the time taken for the operation on the AtomContainerSet is normalized to 1.0 and the time with the ArrayList is appropriately converted.

The raw timings are shown below:

Operation AtomContainerSet (sec) ArrayList (sec)
Add 0.0036 0.0001
Indexed Random Access 0.006 0.007
Indexed Serially 0.007 0.012
Remove (by reference) 68.119 0.1
Remove (by index) 0.695 0.0

As you can see, the ArrayList is significantly faster in all except random and serial access. In that case, if you consider the raw numbers, the run times are essentially equivalent for random access, though serial access of an array in AtomContainerSet gives it an edge over ArrayList.

So if AtomContainerSet and related classes are purely for storage purposes, they could easily be replaced by ArrayList’s. However the fact that they include support for notifications suggests somebody is using it for more complex scenarios. In either case, a discussion of the design and role of these classes would be useful

Written by Rajarshi Guha

April 15th, 2009 at 5:27 pm

6 Responses to 'Performance of AtomContainerSet versus ArrayList'

Subscribe to comments with RSS or TrackBack to 'Performance of AtomContainerSet versus ArrayList'.

  1. The figure is a bit deceptive, as I actually compared the different blocks too… but I’d say this is a clear one.

    Here comes a patch for trunk :)

    Egon Willighagen

    15 Apr 09 at 5:34 pm

  2. Seriously… this means that keeping a custom IAtomContainer[] are must be dropped… removing the complete IAtomContainerSet interface has the additional downside that we cannot set properties on the set…

    Can you rerun the comparison by making an IAtomContainerSet implementation that uses a ArrayList internally?

    Egon Willighagen

    15 Apr 09 at 5:37 pm

  3. While an internal ArrayList would help, I think we should discuss the need for these specialized containers in the first place, as opposed to using parametrized List objects.

    are notifications on these specialized containers really useful?

    Rajarshi Guha

    15 Apr 09 at 5:44 pm

  4. And what about serial access? It is more common to access from 1..n, and in that specific order, than random…

    I also wonder how this reflects on the use of IAtom[] in AtomContainer….

    Egon Willighagen

    15 Apr 09 at 5:49 pm

  5. Updated graph and table to report serial access timings. AtomContainerSet does better when we’re serially traversing the elements – probably by virtue of it being a simple array. So this bodes well for IAtom[] in AtomContainer

    Rajarshi Guha

    15 Apr 09 at 5:59 pm

  6. Just as a historical side-note, a lot of these containers where created because in these days Java had only synchronized container classes such as the Vector class, which where utterly slow. This is now all history and we should get rid of ballast where needed.

Leave a Reply