Archive for the ‘visualization’ Category
I’ve been working for some time with the PubChem Bioassay collection – a set of 1293 assays that cover a range of techniques (enzymatic, phenotypic etc.), targets and sizes (from 20 molecules to 200,000 molecules). In addition, some assays are primary, high-throughput assays whereas a number of them are smaller, confirmatory assays. While an extremely valuable collection, one of the drawbacks is the lack of curation. This has led to some people saying that the data is too noisy to be useful. Yes, the noise is a problem, but I think there’s still useful data to extract and model.
One of the problems that I have faced is that while one can perform a full text search for assays on PubChem, there is no form of annotations on the assays themselves. One effect of this is that it is difficult to link an assay to other biological resources (though for enzymatic assays, one can determine a Pubmed protein identifier). While working on my bioassay network project, I needed annotations and I didn’t want to do it manually.
The ONSChallenge has been running for some time now and the simple web query form that tied in the data from Google Docs along with web services from IU has turned out to be pretty handy. With more and more data becoming available, I had done some initial exploratory analysis of the measured solubilities. One thing that is useful to the experimentalists is a suggestion of which compound to test next. This could be made on the basis of many factors – availability, ease of synthesis and so on. But one way to look at it is to examine what types of compounds have been tested previously, and suggest that the subsequent compounds be very different from those that have been tested.
Has there been work on creating visualizations of “conformer envelopes”, graphical representations of the conformational space occupied (or available) to molecules. Particularly when such visualizations are used to (quickly/visually) compare whether 2 molecules can adopt the same shape – or if there are shapes of one that can’t be adopted by another.
A while back when I was investigating the use of the Ballester & Graham-Richards shape descriptors for 3D similarity searching. It turns out they perform quite poorly in enrichment benchmarks (which I’ll describe in a future post). At that time I was thinking of how Pub3D could scale to a multi-conformer version and I realized that the shape descriptors would allow me to easily visualize the “shape space” of a set of compounds. When these compounds are conformers for a molecule, one effectively gets a conformational envelope.
In a previous post, I described a simple web form to query and visualize the solubility data being generated as part of the ONS Challenge. The previous approach required me to manually download the data and load it into a Postgres database. While trivial from a coding point of view, it’s a pain since I have to keep my local DB in sync with the Google Docs spreadsheet.
This is very nice since I now no longer have to maintain a local DB and ensure that it’s in sync with Jean-Claudes results. Of course, there are some drawbacks to this method. First, the query page will assume that the data in the spreadsheet is clean. So if there are two entries called “Ethanol” and “ethanol”, they will be considered seperate solvents. Secondly, this approach cannot be used to include cheminformatics in the queries, since Google doesn’t support that functionality. Finally, it’s not going to be very good for large spreadsheets.
However, this is a very nice API, that allows one to elegantly integrate web applications with live data. I heart Google!
There was a FriendFeed dicussion on the use of RDF triples for representing the solubilty data generated by Jean-Claude and others as part of the ONS Solubility Challenge. Part of the discussion revolved around letting RDF novices easily perform queries of the data being collected. Not knowing much about RDF, I took the raw data from the Google Docs and loaded it into a Postgres database and whipped up a simple query form.
The DB and form are nothing remarkable. But what is cool is that the Google Visualization API makes it really easy for me include charts and other visualizations very easily. For example, if you select “any” as the solvent and then select a solute, the form creates a table of solubilities of that solute in all the solvents it was measured in. A natural view of the data is to look at a bar chart of the solubilities across the various solvents.
var data = new google.visualization.DataTable();
data.addColumn(’number’, ‘Conc (M)’);
data.setValue(0, 0, ‘thf’);
data.setValue(0, 1, 1.23);
data.setValue(1, 0, ‘acetonitrile’);
data.setValue(1, 1, 2.34);
Once you have the data all stored, some more boilerplate code allows us to easily insert the chart into the final web page. Very neat!