# So much to do, so little time

Trying to squeeze sense out of chemical data

## Visual pairwise comparison of distributions

While analysing some data from a dose respons screen, run across multiple cell lines, I need to visualize summarize curve data in a pairwise fashion. Specifically, I wanted to compaure area under the curve (AUC) values for the curve fits for the same compound between every pair of cell line. Given that an AUC needs a proper curve fit, this means that the number of non-NA AUCs is different for each cell line. As a result making  a scatter plot matrix (via plotmatrix) won’t do.

A more useful approach is to generate a matrix of density plots, such that each plot contains the distributions of AUCs from each pair of cell lines over laid on each other. It turns out that some data.frame wrangling and facet_grid makes this extremely easy.

 12345678 library(ggplot2) library(reshape) tmp1 <- data.frame(do.call(cbind, lapply(1:5, function(x) {   r <- rnorm(100, mean=sample(1:4, 1))   r[sample(1:100, 20)] <- NA   return(r) })))

Next, we need to expand this into a form that lets us facet by pairs of variables

 12345678 tmp2 <- do.call(rbind, lapply(1:5, function(i) {   do.call(rbind, lapply(1:5, function(j) {     r <- rbind(data.frame(var='D1', val=tmp1[,i]),                data.frame(var='D2', val=tmp1[,j]))     r <- data.frame(xx=names(tmp1)[i], yy=names(tmp1)[j], r)     return(r)   })) }))

Finally, we can make the plot

 1234 ggplot(tmp2, aes(x=val, fill=var))+   geom_density(alpha=0.2, position="identity")+   theme(legend.position = "none")+   facet_grid(xx ~ yy, scales='fixed')

Giving us the plot below.

I had initially asked this on StackOverflow where Arun provided a more elegant approach to composing the data.frame

Written by Rajarshi Guha

February 10th, 2013 at 3:03 pm