So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘plot’ tag

Visual pairwise comparison of distributions

without comments

While analysing some data from a dose respons screen, run across multiple cell lines, I need to visualize summarize curve data in a pairwise fashion. Specifically, I wanted to compaure area under the curve (AUC) values for the curve fits for the same compound between every pair of cell line. Given that an AUC needs a proper curve fit, this means that the number of non-NA AUCs is different for each cell line. As a result making  a scatter plot matrix (via plotmatrix) won’t do.

A more useful approach is to generate a matrix of density plots, such that each plot contains the distributions of AUCs from each pair of cell lines over laid on each other. It turns out that some data.frame wrangling and facet_grid makes this extremely easy.

Lets start with some random data, for 5 imaginary cell lines

1
2
3
4
5
6
7
8
library(ggplot2)
library(reshape)

tmp1 <- data.frame(do.call(cbind, lapply(1:5, function(x) {
  r <- rnorm(100, mean=sample(1:4, 1))
  r[sample(1:100, 20)] <- NA
  return(r)
})))

Next, we need to expand this into a form that lets us facet by pairs of variables

1
2
3
4
5
6
7
8
tmp2 <- do.call(rbind, lapply(1:5, function(i) {
  do.call(rbind, lapply(1:5, function(j) {
    r <- rbind(data.frame(var='D1', val=tmp1[,i]),
               data.frame(var='D2', val=tmp1[,j]))
    r <- data.frame(xx=names(tmp1)[i], yy=names(tmp1)[j], r)
    return(r)
  }))
}))

Finally, we can make the plot

1
2
3
4
ggplot(tmp2, aes(x=val, fill=var))+
  geom_density(alpha=0.2, position="identity")+
  theme(legend.position = "none")+
  facet_grid(xx ~ yy, scales='fixed')

Giving us the plot below.

I had initially asked this on StackOverflow where Arun provided a more elegant approach to composing the data.frame

Written by Rajarshi Guha

February 10th, 2013 at 3:03 pm