Hadoop and Atom Counting

Over the past few months I’ve been hacking together scripts to distribute data parallel jobs. However, it’s always nice when somebody else has done the work. In this case, Hadoop is an implementation of the map/reduce framework from Google. As Yahoo and others have shown, it’s an extremely scalable framework, and when coupled with Amazons […]

PubChem Bioassay Annotation Poster

Sometime back I had described some work on the automated annotation of PubChem bioassays. The lack of annotations on the assays can make it difficult to integrate with other biological resources. Ideally, the bioassays would be manually annotated – however, it’s not a very exciting job. So, collaborating with Patrick Ruch and Julien Gobeill, we […]

Performance of AtomContainerSet versus ArrayList

In my previous post, I had asked whether we really need AtomContainerSet and other related specialized container classes as opposed to using parametrized List objects. Gilleain had mentioned some issues that might require these specialized classes. But at this point it’s not clear to me what the intended goal for these classes was. For now, […]