So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for November, 2008

Live ONS Solubility Queries

without comments

In a previous post, I described a simple web form to query and visualize the solubility data being generated as part of the ONS Challenge. The previous approach required me to manually download the data and load it into a Postgres database. While trivial from a coding point of view, it’s a pain since I have to keep my local DB in sync with the Google Docs spreadsheet.

However, Google comes to the rescue with their Query API, which allows us to view the spreadsheet as a table which can be queried using an SQL like language. As a result, I can ditch the whole local database, and simply have an HTML page constructed using Javascript, which performs queries directly on the solubility spreadsheet.

This is very nice since I now no longer have to maintain a local DB and ensure that it’s in sync with Jean-Claudes results. Of course, there are some drawbacks to this method. First, the query page will assume that the data in the spreadsheet is clean. So if there are two entries called “Ethanol” and “ethanol”, they will be considered seperate solvents. Secondly, this approach cannot be used to include cheminformatics in the queries, since Google doesn’t support that functionality. Finally, it’s not going to be very good for large spreadsheets.

However, this is a very nice API, that allows one to elegantly integrate web applications with live data. I heart Google!

Written by Rajarshi Guha

November 6th, 2008 at 8:01 pm

Solubility Queries and the Google Visualization API

with one comment

There was a FriendFeed dicussion on the use of RDF triples for representing the solubilty data generated by Jean-Claude and others as part of the ONS Solubility Challenge. Part of the discussion revolved around letting RDF novices easily perform queries of the data being collected.  Not knowing much about RDF, I took the raw data from the Google Docs and loaded it into a Postgres database and whipped up a simple query form.

The DB and form are nothing remarkable. But what is cool is that the Google Visualization API makes it really easy for me include charts and other visualizations very easily. For example, if you select “any” as the solvent and then select a solute, the form creates a table of solubilities of that solute in all the solvents it was measured in. A natural view of the data is to look at a bar chart of the solubilities across the various solvents.

Since my form is built using mod_python, it’s a simple matter to write out the Javascript to call the Google API. After some boilerplate code, all that needs to be done is to create a DataTable object, set the column types and names and then populate it. See here for example code, which I modified.

1
2
3
4
5
6
7
8
var data = new google.visualization.DataTable();
data.addColumn(’string’, ‘Solvent’);
data.addColumn(’number’, ‘Conc (M)’);
data.addRows(5);
data.setValue(0, 0, ‘thf’);
data.setValue(0, 1, 1.23);
data.setValue(1, 0, ‘acetonitrile’);
data.setValue(1, 1, 2.34);

Once you have the data all stored, some more boilerplate code allows us to easily insert the chart into the final web page. Very neat!

(Of course, since these queries do not involve chemistry / cheminformatics, I could skip Python and Postgres and simply do the whole thing in Javascript, querying the Google Docs spreadsheet directly. This means that the results from the form would always be in sync with the Google Doc, but that’s for another evening)

Written by Rajarshi Guha

November 6th, 2008 at 3:39 am

Posted in software,visualization

Tagged with , ,

Kinase Inhibitors and Polypharmacology

without comments

The Curious Wavefunction has a nice post on the issue of selective and non-selective kinase inhibitors. An interesting commentary, especially in the light of the recent paper on network polypharmcology. While there have been a number of papers on polypharmcology and the idea itself is very attractive, it has seemed to me that for this approach to succeed we need very detailed information on the targets and systems involved in these networks. Indeed, a current project of mine is currently hitting this problem. As Ashutosh notes,

… in the first place we don’t even know what specific subset of kinases to hit for treating a particular disease. First comes target validation, then modulation.

Written by Rajarshi Guha

November 5th, 2008 at 11:18 pm