I just updated the CDK Nightly build script so that it summarizes the state of unit test coverage. Currently, trunk has a total of 3215 methods (in 378 classes) that are missing unit tests. See the JUnit test summary for a module-wise summary.
I’m in academia and I do cheminformatics. Recent collaborations, papers and funding issues in this field have made me think about the future of this research in this setting. This, and a thread discussing David Leahy’s talk on InkSpot Science at the Soton Open Science Workshop got me started on this post.
There are currently a number of groups and collaborations that are attempting to perform drug discovery without the large centralized infrastructure that is characteristic of this process. Examples of this include Jean Claude Bradley who runs the UsefulChem project and the Synaptic Leap as well as various academic labs. Also see Kozikowski et al
Cheminformatics plays a key role in drug discovery efforts at various stages. For example, identifying or prioritizing compounds from virtual libraries, predicting ADME profiles and side effects (e.g., hERG activation) and so on. I should stress that such computational methods don’t replace bench work – but they can certainly enhance it. More generally, we’re now faced with a deluge of data – and human eyeballs are not going to be able to handle this. And this is exactly the place that cheminformatics does it’s stuff.
Houghten, R. et al, “Strategies for the Use of Mixture-Based Synthetic Combinatorial Libraries: Scaffold Ranking, Direct Testing In Vivo, and Enhanced Deconvolution by Computational Methods”, J. Comb. Chem., 2008, 10, 3-19
Recently a collaborator pointed me to the above article by Houghten and co-workers where they describe the use of mixture-based combinatorial libraries for high-throughput screening (HTS) experiments.
Traditionally an HTS experiment will screen thousands to millions of individual molecules. Obviously, it’s all done by robots so though you have to be careful during setup it’s not like you have to do it all by hand. But the fact is, if it’s possible to reduce the actual number of individual screens, life becomes easier and cheaper. Houghten et al describe an elegant approach that does just this.
Essentially, it presents something akin to a natural language interface using a wide variety of “commands”. The magic is in these commands. As shown in the video on the Mozilla Labs homepage, it allows you select text on a web page and search various websites, query Google Maps, translate, perform a calculation and so on.
The key to all this functionality is the use of a variety of web accessible services such as the API’s provided by Google and Amazon. As a result, the interface makes all these distributed services ubiquitious in the sense that they are always available where ever you maybe on the web – no need to open another window or visit another site. In a sense one could replicate the functionality of Greasemonkey using this. Indeed, many things that might have required a full blown plugin can be done at a much lower cost using this interface.
OK, enough of background (for more details take a look at the Wiki). How does it help chemists and cheminformatics? At this point I’m not going to go into details about how one implements Ubiquity commands. Rather I’ll describe some that I quickly whipped up. These are available here
If you’re browsing a web page and come across a SMILES, it’s easy to get a quick depiction using the “depict” command. So, bring up Ubiquity and then type something like “c1ccccc1” (or select a SMILES on a webpage). You’ll see image in the preview panel. But it goes one step further. If you’re in an online Rich Text box (such as when you compose a mail in GMail), the image gets inserted.
Alternatively, lets say you’re browsing and come across a compound called “phenobarbitol”. What’s the SMILES for this? What does the molecule look like in 2D? Bring up Ubiquity and simply type the compound name and look at the results. Alternatively just select the text and then bring up Ubiquity – no need to type anything.
Another useful command is one that converts a SMILES string to an InChI or InChI key or SDF formatted string. It’s a little restricted because most people don’t always directly write a SMILES string, but with a little work it could be useful when writing blog posts and very easily include an InChI at the end to allow easy indexing, extraction and so on.
The final example, is the “toxic” command – select a piece of text (or type it at the prompt) and bring up Ubiquity and get an estimation of whether it’s going to be toxic or not.
Granted these are very simplistic commands (and the “toxic” command is not too accurate either). But they were very quick to write, since I didn’t have to bother about interfaces, packaging and so on. Another reason why presenting these features was so easy was that they make use of cheminformatics web services hosted at Indiana University. So we get depictions by calling a web service and perform name to structure conversion using a web service interface to a local mirror of PubChem.
For me at least, Ubiquity presents a really low barrier to entry for writing mashups.
I finally decided to get back into blogging – I’ve been doing it of and on (more off than on) and over time I’ve realized that updating static web pages are useful, but a blog lets me say things faster and (hopefully) get discussion going faster as well.
Since I spend most of time doing chemnformatics most of the posts will be related to that. However there will probably be posts relating to various topics such as academics, research, coding and cycling. I’m also quite attracted to shiny new things, so I hope to blog about experiences with new tools and techniques.
As to how often I’ll be posting, I hope to be regular but as always, real life (i.e. non-fun stuff) intrudes, so things may be slow.