So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for January, 2009

Improved CDK Depiction Service

without comments

The folks at the EBI have been doing some great work on the CDK. A major effort is underway to revamp JChemPaint and part of this involves improving the rendering of 2D depictions. While not complete I rebuilt a version of the CDK 1.2.x branch with the latest rendering code from the jchempaint-primary branch and updated the CDK web service. The results are much nicer, though there’s scope for improvements. See for example

http://rguha.ath.cx/~rguha/cicc/rest/depict/c1ccccc1
http://rguha.ath.cx/~rguha/cicc/rest/depict/C1CCCCC12CCCCC2
http://rguha.ath.cx/~rguha/cicc/rest/depict/CC(=O)OC1=CC=CC=C1C(=O)O
http://rguha.ath.cx/~rguha/cicc/rest/depict/c1ccccc1CC=CC%23N

Thanks to Gilleain and Egon for pointing me in the right direction. Anybody using this service should see the new depictions automatically

Written by Rajarshi Guha

January 14th, 2009 at 6:04 pm

Posted in software,Uncategorized

Tagged with ,

Update to the REST Descriptor Services

with 2 comments

The current version of the REST interface to the CDK descriptors allowed one to access descriptor values for a SMILES string by simply appending it to an URL, resulting in something like

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/
org.openscience.cdk.qsar.descriptors.molecular.ALOGPDescriptor/c1ccccc1COCC

This type of URL is pretty handy to construct by hand. However, as Pat Walters pointed out in the comments to that post, SMILES containing ‘#’ will cause problems since that character is a URL fragment identifier. Furthermore, the presence of a ‘/’ in a SMILES string necessitates some processing in the service to recognize it as part of the SMILES, rather than a URL path separator. While the service could handle these (at the expense of messy code) it turned out that there were subtle bugs.

Based on Pats’ suggestion I converted the service to use base64 encoded SMILES, which let me simplify the code and remove the bugs. As a result, one cannot append the SMILES directly to the URL’s. Instead the above URL would be rewritten in the form

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/
org.openscience.cdk.qsar.descriptors.molecular.ALOGPDescriptor/YzFjY2NjYzFDT0ND

All the example URL’s described in my previous post that involve SMILES strings, should be rewritten using base64 encoded SMILES. So to get a document listing all descriptors for “c1ccccc1COCC” one would write

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/YzFjY2NjYzFDT0ND

and then follow the links therein.

While this makes it a little harder to directly write out these URL’s by hand, I expect that most uses of this service would be programmatic – in which case getting base64 encoded SMILES is trivial.

Written by Rajarshi Guha

January 11th, 2009 at 5:52 pm

Playing with REST Descriptor Services

with 4 comments

As part of my work at IU I have been implementing a number of cheminformatics web services. Initially these were SOAP, but I realized that REST interfaces make life much easier. (also see here) As a result, a number of these services have simple REST interfaces. One such service provides molecular descriptor calculations, using the CDK as the backend. Thus by visitingĀ  (i.e., making a HTTP GET request) a URL of the form

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/CC(=O)

you get a simple XML document containing a list of URL’s. Each URL represents a specific “resource”. In this context, the resource is the descriptor values for the given molecule. Thus by visiting

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/
org.openscience.cdk.qsar.descriptors.molecular.ALOGPDescriptor/CC(=O)C

one gets another simple XML document that lists the names and values of the AlogP descriptor. In this case, the CDK implementation evaluates AlogP, AlogP2 and molar refractivity – so there are actually three descriptor values. On the other hand something like theĀ  molecular weight descriptor gives a single value. To just see the list of available descriptors visit

http://www.chembiogrid.org/cheminfo/rest/desc/descriptors

which gives an XML document containing a series of links. Visiting one of these links gives the “descriptor specification” – information on the vendor, version, reference to a descriptor ontology and so on.

(I should point out that the descriptors available in this service are from a pretty old version of the CDK. I really should update the descriptors to the 1.2.x versions)

Applications

This type of interface makes it easy to whip up various applications. One example is the PCA analysis of compound collections. Another one I put together today based on a conversation with Jean-Claude was a simple application to plot pairs of descriptor values for a collection of SMILES.

dppss1

The app is pretty simple (and quite slow, since it uses synchronous GET’s to the descriptor service for each SMILES and has to make two calls for each SMILES – hey, it was a quick hack!). Currently, it’s a bit restrictive – if a descriptor calculates multiple values, it will only use the first value. To see how many values a molecular descriptor calculates, see the list here.

With a little more effort one could easily have a pretty nice online descriptor calculation application rivaling a standalone application such as the the CDK descriptor GUI

Also,if you struggle with nice CSS layouts, the CSS Layout Collection is a fantastic resource. And jQuery rocks.

Written by Rajarshi Guha

January 7th, 2009 at 7:06 am

First Steps with Git

with 3 comments

With all the stuff I’ve been hearing about Git I’ve been looking to play around with it. While I have been hosting my own Subversion repo on my office machine, the use of GitHub seemed like a good way to play with Git and also have a stable external repo.

So right now the CDKDescUI project has been shifted into Git and is located here. I’ve also shifted my REST web services here

Written by Rajarshi Guha

January 5th, 2009 at 5:50 pm

Posted in software

Tagged with ,

Quick Comments on an Analysis of Antithrombotics

without comments

Joerg has made a nice blog post on the use of Open Source software and data to analyse the occurence of antithrombotics. More specifically he was trying to answer the question

Which XRay ligands are closest to the Fontaine et al. structure-activity relationship data for allowing structure-based drug design?

Using Blue Obelisk tools and ChemSpider and where Fontaine et al. refers to the Fontaine Factor Xa dataset. You should read his post for a nice analysis of the problem. I just wanted to consider two points he had raised.

Read the rest of this entry »

Written by Rajarshi Guha

January 5th, 2009 at 1:36 am