# So much to do, so little time

Trying to squeeze sense out of chemical data

## Playing with REST Descriptor Services

As part of my work at IU I have been implementing a number of cheminformatics web services. Initially these were SOAP, but I realized that REST interfaces make life much easier. (also see here) As a result, a number of these services have simple REST interfaces. One such service provides molecular descriptor calculations, using the CDK as the backend. Thus by visiting  (i.e., making a HTTP GET request) a URL of the form

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/CC(=O)

you get a simple XML document containing a list of URL’s. Each URL represents a specific “resource”. In this context, the resource is the descriptor values for the given molecule. Thus by visiting

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/
org.openscience.cdk.qsar.descriptors.molecular.ALOGPDescriptor/CC(=O)C

one gets another simple XML document that lists the names and values of the AlogP descriptor. In this case, the CDK implementation evaluates AlogP, AlogP2 and molar refractivity – so there are actually three descriptor values. On the other hand something like the  molecular weight descriptor gives a single value. To just see the list of available descriptors visit

http://www.chembiogrid.org/cheminfo/rest/desc/descriptors

which gives an XML document containing a series of links. Visiting one of these links gives the “descriptor specification” – information on the vendor, version, reference to a descriptor ontology and so on.

(I should point out that the descriptors available in this service are from a pretty old version of the CDK. I really should update the descriptors to the 1.2.x versions)

### Applications

This type of interface makes it easy to whip up various applications. One example is the PCA analysis of compound collections. Another one I put together today based on a conversation with Jean-Claude was a simple application to plot pairs of descriptor values for a collection of SMILES.

The app is pretty simple (and quite slow, since it uses synchronous GET’s to the descriptor service for each SMILES and has to make two calls for each SMILES – hey, it was a quick hack!). Currently, it’s a bit restrictive – if a descriptor calculates multiple values, it will only use the first value. To see how many values a molecular descriptor calculates, see the list here.

With a little more effort one could easily have a pretty nice online descriptor calculation application rivaling a standalone application such as the the CDK descriptor GUI

Also,if you struggle with nice CSS layouts, the CSS Layout Collection is a fantastic resource. And jQuery rocks.

Written by Rajarshi Guha

January 7th, 2009 at 7:06 am

I met with Jean-Claude Bradley yesterday and we had a pretty useful hack session, allowing him to easily incorporate chemical and cheminformatics functionality into a GoogleDocs spreadsheet.

A common task that Jean-Claude wanted to automate was the calculation of milligrams (or milliliters) of a chemical required for a certain molarity.  So what we need for this calculation is the compound name, desired molarity, molecular weight and the density. Importantly, the people who’d like to use this will provide compound names and not a directly parseable SMILES.  So we’d also like to (optionally) get the SMILES. Finally, he wanted to be able to do this in a Google spreadsheet – rather than a specific web page or stand alone program.

It turns out that with a liberal helping of Python, a dash of ChemSpider and pinch of PubChem, all of this can be done in a half hour hack session.

Written by Rajarshi Guha

December 10th, 2008 at 4:23 pm

## Live ONS Solubility Queries

In a previous post, I described a simple web form to query and visualize the solubility data being generated as part of the ONS Challenge. The previous approach required me to manually download the data and load it into a Postgres database. While trivial from a coding point of view, it’s a pain since I have to keep my local DB in sync with the Google Docs spreadsheet.

However, Google comes to the rescue with their Query API, which allows us to view the spreadsheet as a table which can be queried using an SQL like language. As a result, I can ditch the whole local database, and simply have an HTML page constructed using Javascript, which performs queries directly on the solubility spreadsheet.

This is very nice since I now no longer have to maintain a local DB and ensure that it’s in sync with Jean-Claudes results. Of course, there are some drawbacks to this method. First, the query page will assume that the data in the spreadsheet is clean. So if there are two entries called “Ethanol” and “ethanol”, they will be considered seperate solvents. Secondly, this approach cannot be used to include cheminformatics in the queries, since Google doesn’t support that functionality. Finally, it’s not going to be very good for large spreadsheets.

However, this is a very nice API, that allows one to elegantly integrate web applications with live data. I heart Google!

Written by Rajarshi Guha

November 6th, 2008 at 8:01 pm

## Solubility Queries and the Google Visualization API

with one comment

There was a FriendFeed dicussion on the use of RDF triples for representing the solubilty data generated by Jean-Claude and others as part of the ONS Solubility Challenge. Part of the discussion revolved around letting RDF novices easily perform queries of the data being collected.  Not knowing much about RDF, I took the raw data from the Google Docs and loaded it into a Postgres database and whipped up a simple query form.

The DB and form are nothing remarkable. But what is cool is that the Google Visualization API makes it really easy for me include charts and other visualizations very easily. For example, if you select “any” as the solvent and then select a solute, the form creates a table of solubilities of that solute in all the solvents it was measured in. A natural view of the data is to look at a bar chart of the solubilities across the various solvents.

Since my form is built using mod_python, it’s a simple matter to write out the Javascript to call the Google API. After some boilerplate code, all that needs to be done is to create a DataTable object, set the column types and names and then populate it. See here for example code, which I modified.

 12345678 var data = new google.visualization.DataTable(); data.addColumn(’string’, ‘Solvent’); data.addColumn(’number’, ‘Conc (M)’); data.addRows(5); data.setValue(0, 0, ‘thf’); data.setValue(0, 1, 1.23); data.setValue(1, 0, ‘acetonitrile’); data.setValue(1, 1, 2.34);

Once you have the data all stored, some more boilerplate code allows us to easily insert the chart into the final web page. Very neat!

(Of course, since these queries do not involve chemistry / cheminformatics, I could skip Python and Postgres and simply do the whole thing in Javascript, querying the Google Docs spreadsheet directly. This means that the results from the form would always be in sync with the Google Doc, but that’s for another evening)

Written by Rajarshi Guha

November 6th, 2008 at 3:39 am

Posted in software,visualization

Tagged with , ,