vSDC, Rank Products and DUD-E

This post is a follow-up to my previous discussion on a paper by Chaput et al. The gist of that paper was that in a virtual screening scenario where a small number of hits are to be selected for followup, one could use an ensemble of docking methods, identify compounds whose scores were beyond 2SD of the mean for each method and take the intersection. My post suggested that a non-parametric approach (rank products, RP) performed similarly to the parametric approach of Chaput et al on the two targets they screened.

The authors also performed a benchmark comparison of their consensus method (vSDC) versus the individual docking methods for 102 DUD-E targets. I was able to obtain the individual docking scores (Glide, Surflex, FlexX and GOLD) for each of the targets, with the aim of applying the rank product method described previously.

In short, I reproduced Figure 6A (excluding the curve for vSDC). In
th0this figure, \(n_{test}\) is the number of compounds selected (from the ranked list, either by individual docking scores or by the rank product) and \(T_{h>0}\) is the percentage of targets for which the \(n_{test}\) selected compounds included one or more actives. Code is available here, but you’ll need to get in touch with the authors for the DUD-E docking scores.

As shown alongside, the RP method (as expected) outperforms the individual docking methods. And visual comparison with the original figure suggests that it also outperforms vSDC, especially at lower values of \(n_{test}\). While I wouldn’t regard the better performance of RP compared to vSDC as a huge jump, the absence of a threshold certainly works in its favor.

One could certainly explore ranking approaches in more depth. As suggested by Abhik Seal, Borda or Condorcet methods could be examined (though the small number of docking methods, a.k.a., voter, could be problematic).

UPDATE: After a clarification from Liliane Mouawad it turns out there was a mistake in the ranking of the Surflex docking scores. Correcting that bug fixes my reproduction of Figure 6A so that the curves for individual docking methods match the original. But more interestingly, the performance of RP is now clearly better than every individual method and the vSDC method as well, at all values of \(n_{test}\)

4 thoughts on “vSDC, Rank Products and DUD-E

  1. Hmmm, data fusion for virtual screening was discussed in details quite a time ago, do I miss something?
    http://pubs.acs.org/doi/abs/10.1021/ci049615w
    http://pubs.acs.org/doi/abs/10.1021/ci300463g
    Willet published around this a lot
    http://pubs.acs.org/action/doSearch?text1=Willett+Peter&field1=Contrib

  2. No, this is not anything new. The original paper used a parametric approach (2SD cutoff) and I wanted to see whether the simpler ensemble rank method reproduces their results or does better. It was easy to do :)

  3. Lu says:

    Alright, data fusion method looks fabulous. Unfortunately, when data fusion hits docking-based virtual screening, it fails, based on my personal experience.

    Take kinase as an example. Back to 2013, I spent about a month tweaking the best-fit interaction constraints, parameters, protein conformers, docking programs to reproduce the crystal structure of kinase-inhibitor complexes. Finally, I realize that without expert knowledge and manual interventions, docking behaves like gambling, not to mention using docking to drug prioritization. I hardly find methodology papers that explicitly claimed that special tweaks were applied to (at least) make sure the predicted protein-ligand binding mode looks reasonable. It’s sad–we have spent decades developing methods to find ‘smart guys’ based on ‘weight’.

    Another dark side of virtual screening is that we are distracted by ‘enrichment’. To me, 100% enrichment of PAIN moiety showing nonspecific signals is worse than hitting ONE drug-like scaffold that shows tractable SAR in the follow-ups. Therefore, more attention should spotlight on 1. eliminating biases inside scoring function (e.g., GOLD unreasonably enriches sulfonamide moiety). 2. introducing ligand efficiency into scoring (to obtain a SARable core scaffold as a good starting point).

    –scientist: “we found we can enrich billionaires using body mass index and blood dopamine levels!”
    –stupid guy: “good job! i can only enrich billionaires by spending $10 on fobes magazine”.

  4. Lu – good points. Indeed, docking is still a crude filter. http://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.5b02008 essentially says that docking is good for filtering (likely) non-binders from (likely) binders. Given the variation in scoring function performance, ensemble (aka data fusion) methods are attractive. But you’re right – expert, manual inspection really is the key

Leave a Reply to Vladimir Chupakhin Cancel reply

Your email address will not be published. Required fields are marked *