In my previous post I had described my initial attempts at working with Hadoop, an implementation of the map/reduce framework. Most Hadoop examples are based on line oriented input files. In the cheminformatics domain, SMILES files are line oriented and so applying Hadoop to a variety of tasks that work with SMILES input is easy. […]
Hadoop and Atom Counting
Over the past few months I’ve been hacking together scripts to distribute data parallel jobs. However, it’s always nice when somebody else has done the work. In this case, Hadoop is an implementation of the map/reduce framework from Google. As Yahoo and others have shown, it’s an extremely scalable framework, and when coupled with Amazons […]
Stack Overflow – Not for Chemistry?
Rich Apodaca recently wrote a post highlighting StackOverflow – a community discussion site for software development, suggesting that a similar type of site for chemists would not work. He also posted a follow up listing some factors that make something like StackOverflow unlikely for the chemistry community. I had made a quick comment noting that […]
Programming For Chemical and Life Science Informatics
Today was the final class of the graduate course I taught this semester. What with traveling and job hunting, it wasn’t as thorough as I would’ve liked (really want to cover Hadoop, EC2 etc). In any case, I’ve put up the slides from the class for posterity.
PubChem Bioassay Annotation Poster
Sometime back I had described some work on the automated annotation of PubChem bioassays. The lack of annotations on the assays can make it difficult to integrate with other biological resources. Ideally, the bioassays would be manually annotated – however, it’s not a very exciting job. So, collaborating with Patrick Ruch and Julien Gobeill, we […]