## Visual pairwise comparison of distributions

While analysing some data from a dose respons screen, run across multiple cell lines, I need to visualize summarize curve data in a pairwise fashion. Specifically, I wanted to compaure area under the curve (AUC) values for the curve fits for the same compound between every pair of cell line. Given that an AUC needs a proper curve fit, this means that the number of non-NA AUCs is different for each cell line. As a result making a scatter plot matrix (via plotmatrix) won’t do.

A more useful approach is to generate a matrix of density plots, such that each plot contains the distributions of AUCs from each pair of cell lines over laid on each other. It turns out that some **data.frame** wrangling and facet_grid makes this extremely easy.

Lets start with some random data, for 5 imaginary cell lines

1 2 3 4 5 6 7 8 | library(ggplot2) library(reshape) tmp1 <- data.frame(do.call(cbind, lapply(1:5, function(x) { r <- rnorm(100, mean=sample(1:4, 1)) r[sample(1:100, 20)] <- NA return(r) }))) |

Next, we need to expand this into a form that lets us facet by pairs of variables

1 2 3 4 5 6 7 8 | tmp2 <- do.call(rbind, lapply(1:5, function(i) { do.call(rbind, lapply(1:5, function(j) { r <- rbind(data.frame(var='D1', val=tmp1[,i]), data.frame(var='D2', val=tmp1[,j])) r <- data.frame(xx=names(tmp1)[i], yy=names(tmp1)[j], r) return(r) })) })) |

Finally, we can make the plot

1 2 3 4 | ggplot(tmp2, aes(x=val, fill=var))+ geom_density(alpha=0.2, position="identity")+ theme(legend.position = "none")+ facet_grid(xx ~ yy, scales='fixed') |

Giving us the plot below.

I had initially asked this on StackOverflow where Arun provided a more elegant approach to composing the data.frame