So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘visualization’ Category

Plate Well Series Plots in R

with 2 comments

Plate well series plots are a common way to summarize well level data across multiple plates in a high throughput screen. An example can be seen in Zhang et al. As I’ve been working with RNAi screens, this visualization has been a useful way to summarize screening data and the various transformations on that data. It’s fundamentally a simple scatter plot, with some extra annotations. Though the x-axis is labeled with plate number, the values on the x-axis are actually well locations. The y-axis is usually the signal from that well.

Since I use it often, here’s some code that will generate such a plot. The input is a list of matrices or data.frames, where each matrix or data.frame represents a plate. In addition you need to specify a “plate map” – a character matrix indicating whether a well is a sample, (“c”) positive control (“p”), negative control (“n”) or ignored (“x”). The code looks like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
plate.well.series <- function(plate.list, plate.map, draw.sep = TRUE, color=TRUE, ...) {
  signals <- unlist(lapply(plate.list, as.numeric))
  nwell <- prod(dim(plate.list[[1]]))
  nplate <- length(signals) / nwell

  cols <- 'black'
  if (color) {
    pcolor <- 'red'
    ncolor <- 'green'
    colormat <-  matrix(0, nrow=nrow(plate.list[[1]]), ncol=ncol(plate.list[[1]]))
    colormat[which(plate.map == 'n')] <- ncolor
    colormat[which(plate.map == 'p')] <- pcolor
    colormat[which(plate.map == 'c')] <-  'black'
    cols <- sapply(1:nwell, function(x) {
      as.character(colormat)
    })
  }
  plot(signals, xaxt='n', ylab='Signal', xlab='Plate Number', col = cols, ...)
  if (color) legend('topleft', bty='n', fill=c(ncolor, pcolor, 'black'),
                    legend=c('Negative', 'Positive', 'Sample'),
                    y.intersp=1.2)
  if (draw.sep) {
    for (i in seq(1, length(signals)+nwell, by=nwell)) abline(v=i, col='grey')
  }
  axis(side=1, at = seq(1, length(signals), by=nwell) + (nwell/2), labels=1:nplate)
}

An example of such a plot is below


Plate Well Series Plot

Plate well series plot


Another example comparing normalized data from three runs of an RNAi screen investigating drug sensitization (also highlighting the fact that plate 7 in the 5nm run was messed up):


Comparing runs with plate well series plots

Comparing runs with plate well series plots


Written by Rajarshi Guha

July 14th, 2009 at 2:01 am

Annotating Bioassays

with 2 comments

I’ve been working for some time with the PubChem Bioassay collection – a set of 1293 assays that cover a range of techniques (enzymatic, phenotypic etc.), targets and sizes (from 20 molecules to 200,000 molecules). In addition, some assays are primary, high-throughput assays whereas a number of them are smaller, confirmatory assays. While an extremely valuable collection, one of the drawbacks is the lack of curation. This has led to some people saying that the data is too noisy to be useful. Yes, the noise is a problem, but I think there’s still useful data to extract and model.

One of the problems that I have faced is that while one can perform a full text search for assays on PubChem, there is no form of annotations on the assays themselves. One effect of this is that it is difficult to link an assay to other biological resources (though for enzymatic assays, one can determine a Pubmed protein identifier). While working on my bioassay network project, I needed annotations and I didn’t want to do it manually.

Read the rest of this entry »

Written by Rajarshi Guha

January 25th, 2009 at 5:03 pm

The ONS Challenge & Visualizing Chemical Space

with 5 comments

The ONSChallenge has been running for some time now and the simple web query form that tied in the data from Google Docs along with web services from IU has turned out to be pretty handy. With more and more data becoming available, I had done some initial exploratory analysis of the measured solubilities. One thing that is useful to the experimentalists is a suggestion of which compound to test next. This could be made on the basis of many factors – availability, ease of synthesis and so on. But one way to look at it is to examine what types of compounds have been tested previously, and suggest that the subsequent compounds be very different from those that have been tested.

Read the rest of this entry »

Written by Rajarshi Guha

December 30th, 2008 at 6:04 pm

Conformational Envelopes

without comments

Joe Leonard posted a question on the CCL mailing list today regarding “conformation envelopes”. More specifically, he asked

Has there been work on creating visualizations of “conformer envelopes”, graphical representations of the conformational space occupied (or available) to molecules. Particularly when such visualizations are used to (quickly/visually) compare whether 2 molecules can adopt the same shape – or if there are shapes of one that can’t be adopted by another.

A while back when I was investigating the use of the Ballester & Graham-Richards shape descriptors for 3D similarity searching. It turns out they perform quite poorly in enrichment benchmarks (which I’ll describe in a future post). At that time I was thinking of how Pub3D could scale to a multi-conformer version and I realized that the shape descriptors would allow me to easily visualize the “shape space” of a set of compounds. When these compounds are conformers for a molecule, one effectively gets a conformational envelope.

Read the rest of this entry »

Written by Rajarshi Guha

November 8th, 2008 at 10:49 pm

Live ONS Solubility Queries

without comments

In a previous post, I described a simple web form to query and visualize the solubility data being generated as part of the ONS Challenge. The previous approach required me to manually download the data and load it into a Postgres database. While trivial from a coding point of view, it’s a pain since I have to keep my local DB in sync with the Google Docs spreadsheet.

However, Google comes to the rescue with their Query API, which allows us to view the spreadsheet as a table which can be queried using an SQL like language. As a result, I can ditch the whole local database, and simply have an HTML page constructed using Javascript, which performs queries directly on the solubility spreadsheet.

This is very nice since I now no longer have to maintain a local DB and ensure that it’s in sync with Jean-Claudes results. Of course, there are some drawbacks to this method. First, the query page will assume that the data in the spreadsheet is clean. So if there are two entries called “Ethanol” and “ethanol”, they will be considered seperate solvents. Secondly, this approach cannot be used to include cheminformatics in the queries, since Google doesn’t support that functionality. Finally, it’s not going to be very good for large spreadsheets.

However, this is a very nice API, that allows one to elegantly integrate web applications with live data. I heart Google!

Written by Rajarshi Guha

November 6th, 2008 at 8:01 pm