Extending the REST PCA Service

I recently described a REST based service for performing PCA-based visualization of chemical spaces. By visiting a URL of the form

http://rguha.ath.cx/~rguha/cicc/rest/chemspace/default/
c1ccccc1,c1ccccc1CC,c1ccccc1CCC,C(=O)C(=O),CC(=O)O

one would get a HTML, plain text or JSON page containing the first two principal components for the molecules specified. With this data one can generate a simple 2D plot of the distributions of molecules in the “default” chemical space.

However, as Andrew Lang pointed out on FriendFeed, one could use SecondLife to look at 3D versions of the PCA results. So I updatesd the service to allow one to specify the number of components in the URL. The above form of the service will still work – you get the first two components by default.

To specify more components use an URL of the form

http://rguha.ath.cx/~rguha/cicc/rest/chemspace/default/3/mol1,mol2,mol3

where mol1, mol2, mol3 etc should be valid SMILES strings. The above URL will return the first three PC’s. To get just the first PC, replace the 3 with 1 and so on. If more components are requested than available, all components are returned.

Currently, the only available space is the “default” space which is 4-dimensional, so you can get a maximum of four components. In general, visit the URL

http://rguha.ath.cx/~rguha/cicc/rest/chemspace/

to obtain a list of currently available chemical spaces, their names and dimensionality.

Caveat

While it’s easy to get all the components and visualize them, it doesn’t always make sense to do so. In general, one should consider those initial principal components that explain a significant portion of the variance (see Kaisers criterion). The service currently doesn’t provide the eigenvalues, so it’s not really possible to decide whether to go to 3, 4 or more components. For most cases, just looking at the first two principal components will sufficient – especially given the currently available chemical space.

Update (Jan 13, 2009)

Since the descriptor service now requires that Base64 encoded SMILES, the example usage URL is now invalid. Instead, the SMILES should be replaced by their encoded versions. In other words the first URL above becomes

http://rguha.ath.cx/~rguha/cicc/rest/chemspace/default/
YzFjY2NjYzE=,YzFjY2NjYzFDQw==,YzFjY2NjYzFDQ0M=,
Qyg9TylDKD1PKQ==,Q0MoPU8pTw==

The ONS Challenge & Visualizing Chemical Space

The ONSChallenge has been running for some time now and the simple web query form that tied in the data from Google Docs along with web services from IU has turned out to be pretty handy. With more and more data becoming available, I had done some initial exploratory analysis of the measured solubilities. One thing that is useful to the experimentalists is a suggestion of which compound to test next. This could be made on the basis of many factors – availability, ease of synthesis and so on. But one way to look at it is to examine what types of compounds have been tested previously, and suggest that the subsequent compounds be very different from those that have been tested.

Continue reading

The Wonders of a Cast Iron Skillet

Being fond of cooking, I’ve tended to collect recipes, utensils and gadgets. One thing that had been missing was a cast iron skillet. I’d been hearing about the wonders of these (naturally non-stick over time, holds heat, evenly distributes heat) for a long time and have been disillusioned with the non-stick stuff (though a small non-stick pan for eggs is handy). So we finally decided to pick up a Lodge cast iron skillet. Though it’s sold as pre-seasoned, we seasoned it once before use.

Our first attempt at using it was to make pan seared steak for Christmas lunch, using directions (1, 2)  from Alton Brown. A juicy 12 oz ribeye, seasoned with kosher salt and coarse ground pepper. Seared for 90 s on the oven top and then put into a 500F oven for 3 minutes each side resulted in a beautiful medium steak. While the steak was resting, we put together a simple sauce with red wine, shallots and the brown bits from the pan.

The result was heavenly! Looks like cooking will be fun with the new skillet.

steak1

Spreading the Word on Open Source Cheminformatics

Over the last few years there has been a lot of activity in the area of Open Source cheminformatics software. Being a contributor to the CDK as well as a supporter of Open Source and Open Data efforts in general, I was delighted to be given the chance to talk about these topics at the BioIT World Conference & Expo. I’ll be talking about the state of art in Open Source cheminformatics, highlighting the advantages and pitfalls of using this type of software, using examples from toolkits, workbenches, pipelining tools and so on. In addition, I’ll be talking a little bit about Open Data and it’s importance and the possibilities that arise from combining Open Source software and Open Data.

Here’s the announcement of the actual meeting:

Join the life sciences community in Boston, MA next April 27-29, 2009 for the 7th Annual Bio-IT World Conference & Expo (www.bio-itworldexpo.com).  Since its debut in 2002, Bio-IT has established itself as a premier event showcasing the myriad applications of IT and informatics to biomedical research and the drug discovery enterprise.  The 2009 program will feature best practice case studies and joint partner presentations relevant to the technologies, research, and regulatory issues of life science, pharmaceutical, clinical, health, and IT professionals.

The ChemSpider Journal of Chemistry

News of the ChemSpider Journal of Chemistry has been posted in various places. This effort is interesting as it is a combination of features that are currently available in different forms. Like other Open Access journals, the CJC will be follow the BOAI and hence be Open Access. In addition it will exhibit markup of the text, such as done by the RSC journals (which are not OA). I’m especially interested in this latter feature for automated processing of articles. While it is good to see the combination of these features, it also interesting to see that the journal will use a just-in-time (JIT) approach, and allow online peer review, commentaries. In this sense, it can be expected to be an especially good venue for ONS style projects.

I think this effort will be an interesting experiment, especially given that many  “traditional” chemists may not have blogs and wiki’s to support a JIT approach, and that a journal might be more acceptable. I recently joined the editorial board. I’m eager to see how the journal evolves and am pleased to be able to contribute to this effort and encourages to do so as well.