# So much to do, so little time

Trying to squeeze sense out of chemical data

## vSDC, Rank Products and DUD-E

This post is a follow-up to my previous discussion on a paper by Chaput et al. The gist of that paper was that in a virtual screening scenario where a small number of hits are to be selected for followup, one could use an ensemble of docking methods, identify compounds whose scores were beyond 2SD of the mean for each method and take the intersection. My post suggested that a non-parametric approach (rank products, RP) performed similarly to the parametric approach of Chaput et al on the two targets they screened.

The authors also performed a benchmark comparison of their consensus method (vSDC) versus the individual docking methods for 102 DUD-E targets. I was able to obtain the individual docking scores (Glide, Surflex, FlexX and GOLD) for each of the targets, with the aim of applying the rank product method described previously.

In short, I reproduced Figure 6A (excluding the curve for vSDC). In
this figure, $$n_{test}$$ is the number of compounds selected (from the ranked list, either by individual docking scores or by the rank product) and $$T_{h>0}$$ is the percentage of targets for which the $$n_{test}$$ selected compounds included one or more actives. Code is available here, but you’ll need to get in touch with the authors for the DUD-E docking scores.

As shown alongside, the RP method (as expected) outperforms the individual docking methods. And visual comparison with the original figure suggests that it also outperforms vSDC, especially at lower values of $$n_{test}$$. While I wouldn’t regard the better performance of RP compared to vSDC as a huge jump, the absence of a threshold certainly works in its favor.

One could certainly explore ranking approaches in more depth. As suggested by Abhik Seal, Borda or Condorcet methods could be examined (though the small number of docking methods, a.k.a., voter, could be problematic).

UPDATE: After a clarification from Liliane Mouawad it turns out there was a mistake in the ranking of the Surflex docking scores. Correcting that bug fixes my reproduction of Figure 6A so that the curves for individual docking methods match the original. But more interestingly, the performance of RP is now clearly better than every individual method and the vSDC method as well, at all values of $$n_{test}$$

Written by Rajarshi Guha

February 13th, 2016 at 7:25 pm

## What Has Cheminformatics Done for You Lately?

Recently there have been two papers asking whether cheminformatics or virtual screening in general, have really helped drug discovery, in terms of lead discovery.

The first paper from Muchmore et al focuses on the utility of various cheminformatics tools in drug discovery.  Their report is retrospective in nature where they note that while much research has been done in developing descriptors and predictors of various molecular properties (solubility, bioavilability etc), it does not seem that this has contributed to increased productivity. They suggest three possible reasons for this

• not enough time to judge the contributions of cheminformatics methods
• methods not being used properly
• methods themselves not being sufficiently accurate.

They then go on consider how these reasons may apply to various cheminformatics methods and tools that are accessible to medicinal chemists. Examples range from molecular weight and ligand efficiency to solubility, similarity and bioisosteres. They use a 3-class scheme – known knowns, unknown knowns and unknown unknowns corresponding to methods whose underlying principles are whose results can be robustly interpreted, methods for properties that we don’t know how to realistically evaluate (but which we may still do so – such as solubility) and methods for which we can get a numerical answer but whose meaning or validity is doubtful. Thus for example, ligand binding energy calculations are placed in the “unknown unknown” category and similarity searches are placed in the “known unknown” category.

It’s definitely an interesting read, summarizing the utility of various cheminformatics techniques. It raises a number of interesting questions and issues. For example, a recurring issue is that many cheminformatics methods are ultimately subjective, even though the underlying implementation may be quantitative – “what is a good Tanimoto cutoff?” in similarity calculations would be a classic example.  The downside of the article is that it does appear at times to be specific to practices at Abbott.

The second paper is by Schneider and is more prospective and general in nature and discusses some reasons as to why virtual screening has not played a more direct role in drug discovery projects. One of the key points that Schneider makes is that

appropriate “description of objects to suit the problem” might be the key to future success

In other words, it may be that molecular descriptors, while useful surrogates of physical reality, are probably not sufficient to get us to the next level. Schneider even states that “… the development of advanced virtual screening methods … is currently stagnated“. This statement is true in many ways, especially if one considers the statistical modeling side of virtual screening (i.e., QSAR). Many recent papers discuss slight modifications to well known algorithms that invariably lead to an incremental improvement in accuracy. Schneider suggests that improvements in our understanding of the physics of the drug discovery problem – protein folding, allosteric effects, dynamics of complex formation, etc – rather than continuing to focus on static properties (logP etc) will lead to advances. Another very valid point is that future developments will need to move away from the prediction or modeling of “… one to one interactions between a ligand and a single target …”  and instead will need to consider “… many to many relationships …“. In other words, advances in virtual screen will address (or need to address) the ligand non-specificity or promiscuity. Thus activity profiles, network models and polyparmacology will all be vital aspects of successful virtual screening.

I really like Schneiders views on the future of virtual screening, even though they are rather general. I agree with his views on the stagnation of machine learning (QSAR) methods but at the same time I’m reminded of a paper by Halevy et al, which highlights the fact that

simple models and a lot of data trump more elaborate models based on less data

Now, they are talking about natural language processing using trillion-word corpora. Not exactly the situation we face in drug discovery! But, it does look like we’re slowly going in the direction of generating biological datasets of large size and of multiple types. A recent NIH RFP proposes this type of development. Coupled with well established machine learning methods, this could be lead to some very interesting developments. (Of course even ‘simple’ properties such as solubility could benefit from a ‘large data’ scenario as noted by Muchmore et al).

Overall, two interesting papers looking at the state of the field from different views.

Written by Rajarshi Guha

April 5th, 2010 at 4:33 am

## Datasets for Virtual Screening Benchmarks

with one comment

Virtual screening (VS) is a common task in the drug discovery process and is a computational method to identify  promising compounds from a collection of hundreds to millions of possible compounds. What “promising” exactly means, depends on the context – it might be compounds that will likely exhibit certain pharmacological effects. Or compounds that are expected to non-toxic. Or combinations of these and other properties. Many methods are available for virtual screening including similarity, docking and predictive models.

So, given the plethora of methods which one do we use? There are many factors affecting choice of VS method including availability, price, computational cost and so on. But in the end, deciding which one is better than another depends on the use of benchmarks. There are two features of VS benchmarks: the metric employed to decide whether one method is better than another and the data used for benchmarking. This post focuses on the latter aspect.

Written by Rajarshi Guha

October 9th, 2008 at 1:49 pm