Nice Article on Open Source Cheminformatics

I met Viven Marx at the BioIT World conference held in Boston earlier this month, in which I spoke on the topic of Open Source cheminformatics. The result of conversations between myself (and Peter Murray Rust) and here were incorporated into an interesting article. (Though the CDK was started at Notre Dame, but is now […]

The Move is Complete

The move to Rockville is complete and I’ve started work at the NCGC. This week has been a bit hectic what with setting up house and getting up to speed at work. But I’m really excited with the stuff that’s going on here – lots of interesting projects in various areas of chemical biology and […]

Cheminformatics with Hadoop and EC2

In the last few posts I’ve described how I’ve gotten up to speed on developing Map/Reduce applications using the Hadoop framework. The nice thing is that I can set it all up and test it out on my laptop and then easily migrate the application to a large production cluster. Over the past few days […]

Hadoop, Chunks and Multi-line Records

In a previous post I described how one requires a custom RecordReader class to deal with multi-line records (such as SD files) in a Hadoop program. While it worked fine on a small input file (less than 5MB) I had not addressed the issue of “chunking” and that caused it to fail when dealing with […]

Substructure Searching with Hadoop

My last two posts have described recent attempts at working with Hadoop, a map/reduce framework. As I noted, Hadoop for cheminformatics is quite trivial when working with SMILES files, which is line oriented but requires a bit more work when dealing with multi-line records such as in SD files. But now that we have a […]