Over the past few months I’ve been hacking together scripts to distribute data parallel jobs. However, it’s always nice when somebody else has done the work. In this case, Hadoop is an implementation of the map/reduce framework from Google. As Yahoo and others have shown, it’s an extremely scalable framework, and when coupled with Amazons […]
PubChem Bioassay Annotation Poster
Sometime back I had described some work on the automated annotation of PubChem bioassays. The lack of annotations on the assays can make it difficult to integrate with other biological resources. Ideally, the bioassays would be manually annotated – however, it’s not a very exciting job. So, collaborating with Patrick Ruch and Julien Gobeill, we […]
Performance of AtomContainerSet versus ArrayList
In my previous post, I had asked whether we really need AtomContainerSet and other related specialized container classes as opposed to using parametrized List objects. Gilleain had mentioned some issues that might require these specialized classes. But at this point it’s not clear to me what the intended goal for these classes was. For now, […]
Discussion Topics for the CDK Workshop (from an absentee)
The CDK workshop is coming up at the EBI next week and it’s very frustrating to not be able to attend because of stupid US visa issues. While the workshop is not very long and already has an excellent program, I think it’d be useful to have a discussion on larger and broader issues regarding […]
Circular Fingerprints with the CDK and Clojure
One of the things I’m liking about Clojure is that it can be used as a quick prototyping language, a lot like Python. This is quite handy when playing with the CDK, as I can avoid a lot of the verbosity of Java code. As an example, I put together a very simplistic circular fingerprint […]