Archive for the ‘ranking’ tag
I came across a paper from Kieth Shockley that describes the use of weighted entropy to rank order dose response curves. As this data type is the bread and butter of my day job a simple ranking method is always of interest to me. However, closer inspection of the paper reveals some fundamental problems.
The paper correctly notes that there is no definitive protocol to rank compounds using their dose response curves. Such rankings are invariably problem dependent – in some cases, simple potency based ranking of good quality curves is sufficient. In other cases structural clustering combined with a measure of potency enrichment is more suitable. In addition, it is also true that all compounds in a screen do not necessarily fit well to a 4-parameter Hill model. This may simply be due to noise but could also be due to some process that is better fit by some other model (bell or U shaped curves). The point being that rankings based on a pre-defined model may not be useful or accurate.
The paper proposes the use of entropy as a way to rank dose response curves in a model-free manner. While a natural approach is to use Shannon entropy, the author suggests that the equal weighting implicit in the calculation is unsuitable. Instead, the use of weighted entropy (WES) is proposed as a more robust approach that takes into account unreliable data points. The author defines the weights based on the level of detection of the assay (though I’d argue that since the intended goal is to capture the reliability of individual response points, a more appropriate weight should be derived from some form of variance – either from replicate data or else pooled across the collection) . The author then suggests that curves should be ranked by the WES value, with higher values indicating a better rank. However, I believe that the use of entropy is not suitable as a ranking procedure and in fact, as my experiments below show, it doesn’t appear to work.
For any proposed ranking scheme, one must first define what the goal is. When ranking dose response curves are we looking for compounds
- that exhibit well defined dose response (top and bottom asymptotes, > 80% efficacy etc)?
- good potency, even if the curve is not that well fit?
- compounds with a specific chemotype?
One of the key omissions of the paper is that it does not explain what the end goal is of the ranking. Entropy of curve data is not equivalent to potency, goodness of fit or other curve characteristics. I suppose one could say that entropy ranking lets one differentiate noise from actual curves (whatever functional form is required to fit them). However this is not necessarily the case as shown in the overlaid density plots shown alongside. The pink region represents WES values calculated from a set of 466 curves whereas the green region represents WES from normally distributed random data (μ = 50, σ = 10). For this case, the WES values from real data (ie measured curves) are completely overlapped by the those derived from random data. Thus in this case, the WES would not differentiate between the two sets of data.
Even ignoring random data, the use of entropy does not reliably differentiate well defined curves, inactive curves and toxic curves. For example, in the figure alongside, the inactive compound exhibits a higher WES than the well defined active curve. The paper does explicitly note that the method was tested on activating curves only, but that should not preclude the use of applying it to inhibitory curves as in this example.
But more fundamentally, if one assumes that the goal of a ranking scheme for dose response curves is to place good quality actives at the top then the propsed WES (or even Shannon entropy, H) does not do the job. One way to to test this is to take a collection of curves, rank them by a measure and identify how many actives are identified in the top N% of the collection, for varying N. Ideally, a good ranking would identify nearly all the actives for a small N. If the ranking were random one would identify N% of the actives in the top N% of the collection. Here an active is defined in terms of curve class, a heuristic that we use to initially weed out poor quality curves and focus on good quality ones. I defined active as curve classes -1.1, -1.2, -2.2 and -2.1. It appears that on four different data sets I looked at, the WES or H do significantly worse than random as shown in the four enrichment curves below (the dashed diagonal corresponds to random ranking).
Instead the ranking scheme that seems to perform consistently better is the AUC (area under the dose response curve). I certainly don’t claim that AUC is a completely robust way to rank dose response curves (in fact for some cases such as invalid curve fits, it is nonsensical). But one would hope that WES does better than random! I also include LAC50, the logarithm of the AC50, as a ranking method simply because the paper considers it a poor way to rank curves (which I agree with, particularly if one does not first filter for good quality, efficacious curves).
Theoretically I see no reason that entropy should correlate with curve quality (as identified by curve class), so I wouldn’t be surprised by a low quality ranking. However, as defined by the paper, the WES is significantly and consistently, poorer than random which is quite surprising.
There are other issues – Table 3 does not seem to be correct. Surely β-testosterone is not an AR agonist with an AC50 of 9.57 x 10-22 μM. In addition, I’m not convinced that a single dataset represents a sufficient validation (given that Tox21 has about 80 published bioassays in PubChem). But in my opinion, this more a sign of sloppy reviewing & editing than anything else.
UPDATE (2/25) – Regenerated the enrichment curves so that data was ranked in the correct order when LAC50 was being used.