BAZOO

So much to do, so little time

Trying to squeeze sense out of chemical data

PubChem Bioassay Annotation Poster

without comments

Sometime back I had described some work on the automated annotation of PubChem bioassays. The lack of annotations on the assays can make it difficult to integrate with other biological resources. Ideally, the bioassays would be manually annotated – however, it’s not a very exciting job. So, collaborating with Patrick Ruch and Julien Gobeill, we used their tool, GOCat, to automatically annotate the PubChem bioassay collection with GO terms. They recently presented a poster on this work at the 3rd International Biocuration Conference in Berlin.

Obviously, automated annotation will not be as good as expert, manual annotations. However it does a decent job and I think it’s in line with a recent post by Duncan Hull, where he quotes a paper from Google

The first lesson of Web-scale learning is to use available large-scale data rather than hoping for annotated data that isn’t available

While we’re not using the PubChem assay data directly for learning, the automated approach to annotations means that we can move on to stuff that can make use of them, rather than waiting on a full manual curation of the assay collection (which will likely supercede automated annotations, when it becomes available).

Written by Rajarshi Guha

April 21st, 2009 at 1:11 pm

Posted in cheminformatics,research

Tagged with , ,

Leave a Reply