So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘visualization’ Category

When is a Bad Plate Bad?

without comments

When running a high-throughput screen, one usually deals with hundreds or even thousands of plates. Due to the vagaries of experiments, some plates will not be ervy good. That is, the data will be of poor quality due to a variety of reasons. Usually we can evaluate various statistical quality metrics to asses which plates are good and which ones need to be redone. A common metric is the Z-factor which uses the positive and negative control wells. The problem is, that if one or two wells have a problem (say, no signal in the negative control) then the Z-factor will be very poor. Yet, the plate could be used if we just mask those bad wells.

Now, for our current screens (100 plates) manual inspection is boring but doable. As we move to genome-wide screens we need a better way to identify truly bad plates from plates that could be used. One approach is to move to other metrics – SSMD (defined here and applications to quality control discussed here) is regarded as more effective than Z-factor – and in fact it’s advisable to look at multiple metrics rather than depend on any single one.

An alternative trick is to compare the Z-factor for a given plate to the trimmed Z-factor, which is evaluated using the trimmed mean and standard deviations. In our set up we trim 10% of the positive and negative control wells. For a plate that appears to be poor, due to one or two bad control wells, the trimmed Z-factor should be significantly higher than the original Z-factor. But for a plate in which, say the negative control wells all show poor signal, there should not be much of a difference between the two values. The analysis can be rapidly performed using a plot of the two values, as shown below. Given such a plot, we’d probably consider plates whose trimmed Z-factor are less than 0.5  and close to the diagonal. (Though for RNAi screens, Z’ = 0.5 might be too stringent).

From the figure below, just looking at Z-factor would have suggested 4 or 5 plates to redo. But when compared to the trimmed Z-factor, this comes down to a single plate. Of course, we’d look at other statistics as well, but it is a quick way to rapidly identify plates with truly poor Z-factors.

A plot of Z-factor versus trimmed Z-factor for a set of 100 plates

A plot of Z-factor versus trimmed Z-factor for a set of 100 plates

Written by Rajarshi Guha

January 29th, 2010 at 5:47 pm

Plate Well Series Plots in R

with 2 comments

Plate well series plots are a common way to summarize well level data across multiple plates in a high throughput screen. An example can be seen in Zhang et al. As I’ve been working with RNAi screens, this visualization has been a useful way to summarize screening data and the various transformations on that data. It’s fundamentally a simple scatter plot, with some extra annotations. Though the x-axis is labeled with plate number, the values on the x-axis are actually well locations. The y-axis is usually the signal from that well.

Since I use it often, here’s some code that will generate such a plot. The input is a list of matrices or data.frames, where each matrix or data.frame represents a plate. In addition you need to specify a “plate map” – a character matrix indicating whether a well is a sample, (“c”) positive control (“p”), negative control (“n”) or ignored (“x”). The code looks like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
plate.well.series <- function(plate.list, plate.map, draw.sep = TRUE, color=TRUE, ...) {
  signals <- unlist(lapply(plate.list, as.numeric))
  nwell <- prod(dim(plate.list[[1]]))
  nplate <- length(signals) / nwell

  cols <- 'black'
  if (color) {
    pcolor <- 'red'
    ncolor <- 'green'
    colormat <-  matrix(0, nrow=nrow(plate.list[[1]]), ncol=ncol(plate.list[[1]]))
    colormat[which(plate.map == 'n')] <- ncolor
    colormat[which(plate.map == 'p')] <- pcolor
    colormat[which(plate.map == 'c')] <-  'black'
    cols <- sapply(1:nwell, function(x) {
      as.character(colormat)
    })
  }
  plot(signals, xaxt='n', ylab='Signal', xlab='Plate Number', col = cols, ...)
  if (color) legend('topleft', bty='n', fill=c(ncolor, pcolor, 'black'),
                    legend=c('Negative', 'Positive', 'Sample'),
                    y.intersp=1.2)
  if (draw.sep) {
    for (i in seq(1, length(signals)+nwell, by=nwell)) abline(v=i, col='grey')
  }
  axis(side=1, at = seq(1, length(signals), by=nwell) + (nwell/2), labels=1:nplate)
}

An example of such a plot is below


Plate Well Series Plot

Plate well series plot


Another example comparing normalized data from three runs of an RNAi screen investigating drug sensitization (also highlighting the fact that plate 7 in the 5nm run was messed up):


Comparing runs with plate well series plots

Comparing runs with plate well series plots


Written by Rajarshi Guha

July 14th, 2009 at 2:01 am

Annotating Bioassays

with 2 comments

I’ve been working for some time with the PubChem Bioassay collection – a set of 1293 assays that cover a range of techniques (enzymatic, phenotypic etc.), targets and sizes (from 20 molecules to 200,000 molecules). In addition, some assays are primary, high-throughput assays whereas a number of them are smaller, confirmatory assays. While an extremely valuable collection, one of the drawbacks is the lack of curation. This has led to some people saying that the data is too noisy to be useful. Yes, the noise is a problem, but I think there’s still useful data to extract and model.

One of the problems that I have faced is that while one can perform a full text search for assays on PubChem, there is no form of annotations on the assays themselves. One effect of this is that it is difficult to link an assay to other biological resources (though for enzymatic assays, one can determine a Pubmed protein identifier). While working on my bioassay network project, I needed annotations and I didn’t want to do it manually.

Read the rest of this entry »

Written by Rajarshi Guha

January 25th, 2009 at 5:03 pm

The ONS Challenge & Visualizing Chemical Space

with 5 comments

The ONSChallenge has been running for some time now and the simple web query form that tied in the data from Google Docs along with web services from IU has turned out to be pretty handy. With more and more data becoming available, I had done some initial exploratory analysis of the measured solubilities. One thing that is useful to the experimentalists is a suggestion of which compound to test next. This could be made on the basis of many factors – availability, ease of synthesis and so on. But one way to look at it is to examine what types of compounds have been tested previously, and suggest that the subsequent compounds be very different from those that have been tested.

Read the rest of this entry »

Written by Rajarshi Guha

December 30th, 2008 at 6:04 pm

Conformational Envelopes

without comments

Joe Leonard posted a question on the CCL mailing list today regarding “conformation envelopes”. More specifically, he asked

Has there been work on creating visualizations of “conformer envelopes”, graphical representations of the conformational space occupied (or available) to molecules. Particularly when such visualizations are used to (quickly/visually) compare whether 2 molecules can adopt the same shape – or if there are shapes of one that can’t be adopted by another.

A while back when I was investigating the use of the Ballester & Graham-Richards shape descriptors for 3D similarity searching. It turns out they perform quite poorly in enrichment benchmarks (which I’ll describe in a future post). At that time I was thinking of how Pub3D could scale to a multi-conformer version and I realized that the shape descriptors would allow me to easily visualize the “shape space” of a set of compounds. When these compounds are conformers for a molecule, one effectively gets a conformational envelope.

Read the rest of this entry »

Written by Rajarshi Guha

November 8th, 2008 at 10:49 pm