My colleagues and I recently published a paper where we explored a few methods to identify differential behavior in dose response screens. While there is an extensive literature about analyzing differential effects in genomic data (e.g. mciroarrays, RNAseq), these methods are based on distributional assumptions that holds for genomic data. This is not necessarily the case for small molecule, dose response data. A separate post will explore this aspect.
So we couldn’t directly apply the methods devised for genomic data. Another issue that we wanted to address was the lack of replicates. As a result certain methods are excluded from consideration (e.g., t-test based methods). The simplest case (or what we refer to as obviously differential) is when a compound is active in one treatment and completely inactive in the other. This is trivial to characterize. The next method we considered was to look at fold changes for individual curve fit parameters and then choose an arbitrary threshold. This is not a particularly robust approach, and has no real statistical basis. However, such thresholding is still used in a number of scenarios (e.g., cherry picking in single point screens). In addition, in this approach you have to choose one of many parameters. So finally, we considered a data fusion approach, that ranked compounds using the rank product method. This method employed potency, response at the highest concentration and the AUC. The nice thing about this method is that it doesn’t require choosing a threshold, provides an empirical p-value and is flexible enough to include other relevant parameters (say, physicochemical properties).
Finally, we examined how single point data (modeled using the response at the highest concentration) compared to dose response data at identifying differential actives. As one might expect, the obviously differential compounds were easily identified. However for compounds active in both treatments, the single point approach led to more false positives. Thus, even those dose response is more resource-intensive, the improved accuracy makes it worth it.
In the next post I’ll look at some of the issues that didn’t make in to this paper – in particular hypothesis based tests that focus on testing differences between model fits. One key observation (also suggested by Gelman) is that strict p-value cutoffs lead one to focus on obvious or well-known effects. For small-scale exploratory analyses such as described in this paper, a more relaxed threshold of 0.1 might be more suitable, allowing marginal effects that may, however, be biologically interesting to be considered.