Summarizing Collections of Curves


I was browsing live notes from the recent IEEE conference on visualization and came across a paper about functional boxplots. The idea is an extension of the boxplot visualization (shown alongside), to a set of functions. Intuitively, one can think of a functional box plot as specific envelopes for a set of functions. The construction of this plot is based on the notion of band depth (see the more general concept of data depth) which is a measure of how far a given function is from the collection of functions. As described in Sun & Genton the band depth for a given function can be computed by randomly selecting \(J\) functions and identifying wether the given function is contained within the minimum and maximum of the \(J \) functions. Repeating this multiple times, the fraction of times that the given function is fully contained within the \(J\) random functions gives the band depth, \(BD_j\). This is then used to order the functions, allowing one to compute a 50% band, analogous to the IQR in a traditional boxplot. There are more details (choice of \(J\), partial bounding, etc.) described in the papers and links above.

drc-bpMy interest in this approach was piqued since one way of summarizing a dose response screen, or comparing dose response data across multiple conditions is to generate a box plot of a single curve fit parameter – say, \(\log IC_{50} \). But if we wanted to consider the curves themselves, we have a few options. We could simply plot all of them, using translucency to avoid a blob. But this doesn’t scale visually. Another option, on the left, is to draw a series of box plots, one for each dose, and then optionally join the median of each boxplot giving a “median curve”. While these vary in their degree of utility, the idea of summarizing the distribution of a set of curves, and being able to compare these distributions is attractive. Functional box plots look like a way to do this. (A cool thing about functional boxplots is that they can be extended to multivariate functions such as surfaces and so on. See Mirzargar et al for examples)

Computing \(BD_j\) can be time consuming if the number of curves is large or \(J\) is large. Lopez-Pintado & Jornsten suggest a simple optimization to speed up this step, and for the special case of \(J = 2\), Sun et al proposed a ranking based procedure that scales to thousands of curves. The latter is implemented in the fda package for R which also generates the final functional box plots.

As an example I considered 6 cell proliferation assays run in dose response, each one running the same set of compounds, but under different growth conditions. For each assay I only considered good quality curves (giving from 349 to 602 curves). The first plot compares the actives identified in the different growth conditions using the \(\log IC_{50}\), and indicates a statistically significant increase in potency in the last three conditions compared to the first three.


In contrast, the functional box plots for the 6 assays, suggest a somewhat different picture (% Response = 100 corresponds to no cell kill and 0 corresponds to full cell kill).


The red dashed curves correspond to outliers and the blue lines correspond to the ‘maximum’ and ‘minimum’ curves (analogous to the whiskers of the traditional boxplot). Importantly, these are not measured curves, but instead correspond to the dose-wise maximum (and minimum) of the real curves. The pink region represents 50% of the curves and the black line represents the (virtual) median curve. In each case the X-axis corresponds to dose (unlabeled to save space). Personally, I think this visualization is a little cleaner than the dose-wise box plot shown above.

The mess of red lines in the plot 1 suggest an issue with the assay itself. While the other plots do show differences, it’s not clear what one can conclude from this. For example, in the plot for 4, the dip on the left hand side (i.e., low dose) could suggest that there is a degree of cytotoxicity, which is comparatively less in 3, 5 and 6. Interestingly none of the median curves are really sigmoidal, suggesting that the distribution of dose responses has substantial variance.

Thoughts on the DREAM Synergy Prediction Challenge

The DREAM consortium has run a number of predictive modeling challenges and the latest one on predicting small molecule synergies has just been published. The dataset that was provided included baseline gene expression of the cell line (OCI-LY3), expression in presence of compound (2 concentrations, 2 time points), dose response data for 14 compounds and the excess over Bliss for the 91 pairs formed from the 14 compounds. Based on this data (and available literature data) participants had to predict a ranking for the 91 combinations.

The paper reports the results of 31 approaches (plus one method that was not compared to the others) and does a good job of summarizing their performance and identifying whether certain data type or certain approaches work better than others. They also investigated the performance of an ensemble of approaches, which, as one might expect, worked better than the single methods. While the importance of gene expression in predictive performance was not as great as I would’ve thought, it was certainly more useful than chemical structure alone. Interestingly, they also noted that “compounds with more targeted mechanisms, such as rapamycin and blebbistatin, were least synergistic“. I suspect that this is somewhat dataset specific, but it will be interesting to see whether this holds in large collections of combination experiment such as those run at NCATS.

Overall, it’s an important contribution with the key take home message being

… synergy and antagonism are highly context specific and are thus not universal properties of the compounds’ chemical, structural or substrate information. As a result, predictive methods that account for the genetics and regulatory architecture of the context will become increasingly relevant to generalize results across multiple contexts

Given the relative dearth of predictive models of compound synergy, this paper is a nice compilation of methods. But there are some issues that weaken the paper.

  • One key issue are the conclusions on model performance. The organizers defined a score, termed probabilistic c-score (PC score). If I understand correctly, a random ranking should give PC = 0.5. It turns out that the best performing method exhibited a PC score = 0.61 with a number of methods hovering around 0.5. Undoubtably, this is a tough problem, but when the authors states that “… this challenge shows that current methodologies can perform significantly better than chance …” I raise an eyebrow. I can only assume that what they meant was that the results were “statistically significantly better than chance“, because in terms of effect size the results are not impressive. After reading this excellent article on p-values and significance testing I’m particularly sensitized to claims of significance.
  • The dataset could have been strengthened by the inclusion of self-crosses. This would’ve allowed the authors to assess actual excess over Bliss values corresponding to additivity (which will not be exactly 0 due to experimental noise), and avoid the use of cutoffs in determining what is synergistic or antagonistic.
  • Similarly, a key piece of data that would really strengthen these approaches is the expression data in presence of combinations. While it’s unreasonable to have this data available for all combinations, it could be used as a first step in developing models to predict the expression profile in presence of combination treatment. Certainly, such data could be used to validate some assumptions made by some of the models described (e.g., concordance of DEG’s induced by single agents implies synergistic response).
  • Kudos for including source code for the top methods, but would’ve been nicer if data files were included so we could actually reproduce the results.
  • The authors conclude that when designing new synergy experiments, one should identify mechanistically diverse molecules to make up for the “small number of potentially synergistic pathways“. While mechanistic diversity is a good idea, it’s not clear how they conclude there are a small number of pathways that play a role in synergy.
  • It’s a pity that the SynGen method was not compared to the other methods. While the authors provide a justification, it seems rather weak. The method only applied to the synergistic combinations (performance was not a whole lot better than random – true positive rate of 56%) – but the text indicates that it predicted synergistic compound pairs. It’s not clear whether this means it made a call on synergy or a predicted ranking. If the latter it would’ve been interesting to see how it compared to the rankings of the synergistic subset of 91 compounds from other methods.

Metabolite Similarity & Dirty Compounds

Edit 10/9/14 – Updated statistics for the 1024 bit fingerprints

There’s been some discussion about a paper by O’Hagan et al that have proposed a Rule of 0.5 that states that 90% of approved drugs exhibit a Tanimoto similarity > 0.5 to one or more human metabolites. Their analysis is based on metabolites listed in Recon2, a reconstruction of the human metabolic network. The idea makes sense and there’s an in depth discussion at In the Pipeline.

Given the authors’ claim that

a successful drug is likely to lie within a Tanimoto distance of 0.5 of a known human metabolite. While this does not mean, of course, that a molecule obeying the rule is likely to become a marketed drug for humans, it does mean that a molecule that fails to obey the rule is statistically most unlikely to do so

I was interested in seeing how this rule of thumb holds up when faced with compounds that are not supposed to make it through the drug development pipeline. Since PAINS appear to be the structural filter du jour, I decided to look at compounds that failed the PAINS filter. I worked with the 10,000 compounds included in Saubern et al. Simon Saubern provided me the set of 861 compounds that failed the PAINS filters, allowing me to extract the set of compounds that passed (9139)

Chris Swain was kind enough to extract the compound entries from the Matlab dump provided by O’Hagan et al. This file contained InChI representations for a subset of the entries. I extracted the 2980 valid InChI strings and converted them to SMILES using ChemAxon molconvert 6.0.5. The processed data (metabolite name, InChI and SMILES) are available here. However, after deduplication, there were 1335 unique metabolites

Now, O’Hagan et al for some reason, used the 166 bit MACCS keys, but hashed them to 1024 bits. Usually, when using a keyed fingerprint, the goal is to retain the correspondence between bit position and substructure. The hashing step results in a loss of such correspondence. So it’s a bit surprising that they didn’t use some sort of path (Daylight) or environment (ECFPn) based fingerprint. Since I didn’t know how they hashed the MACCS keys, I calculated 166 bit MACCS keys and 1024 bt ECFP6 and extended path fingerprints using the CDK (via rcdk). Then for each compound in the PAINS pass or fail set, I computed the similarity to each of the 1335 metabolites and identified the maximum similarity (termed NMTS in the paper) and then plotted the distribution of these NMTS values between the PAINS pass and fail sets.


First, the similarity cutoff proposed by the authors is obiously dependent on the fingerprint. So while the bulk of the 166 bit MACCS similarities are > 0.5, this is not really meaningful. A more relevant comparison is to 1024 bit fingerprints – both are hashed, so should be somewhat comparable to the authors choice of hashed MACCS keys.

The path fingerprints lead to an NMTS of ~ 0.25 for both PAINS pass and fail sets and the ECFP6 leads to an NMTS of ~ 0.18 for both sets. Though the difference in medians between the pass and fail sets for the path fingerprint is statistically significant (p = 1.498e-05, Wilcoxon test), the difference itself is very small: 0.005. (For the circular fingerprint there is no statistically significant difference). However, the PAINS pass set does contain more outliers with values > 0.5. In that sense the proposed rule does separate the two groups. Of the top of my head I don’t know whether the WEHI screening deck that was the source of the 10,000 compounds was designed to be drug-like. At the same time all this might be saying is there is no relationship between metabolite-likenes and PAINS-likeness.

It’d be interesting to see how this type of analysis holds up with other well known filter rules (REOS, Lilly etc). A related thing to look at would be to see how druglikeness scores compare with NMTS values.

Code and data are available in this repository

rinchi – An R package to generate InChI’s and InChI Keys

While trying to update rcdk on CRAN it was pointed out to me that usage of the library resulted in modifications to the users home directory. Specifically, this occurred when generating InChI‘s. The CDK makes use of jni-inchi, which in turn depends on JNATI which enables Java code to work with native libraries in a platform independent fashion. As part of this, it creates $HOME/.jnati – which is a no-no for CRAN packages. To resolve this, the latest version of rcdklibs excludes the InChI module and its dependencies. Hopefully rcdk and rcdklibs will now pass CRAN QC.

To access InChI functionality in R you can use the rinchi package which is hosted on Github. Since it will modify the users home directory, it cannot be hosted on CRAN. However, it’s easy enough to install

install_github("cdkr", "rajarshi", subdir="rinchi")

Importantly, if all you need is to go from SMILES to InChI, there is no need to install rcdk as well. So the following works

inchi <- get.inchi('CCC')
inchik <- get.inchi.key('CCC')

But if you do have a molecule object obtained via rcdk, you can also pass that in to get an InChI or InChI key representation.

Predicting Synergy from Lipophilicity?

I recently came across a paper by Yilancioglu et al that described a method to predict drug synergies using only lipophilicity. In effect, it claimed to predict synergy based purely on a physicochemical property and independent of target or pathway information. Their results suggest that

combinations of two lipophilic drugs had a greater tendency to show drug synergy

I must admit that I’m skeptical of this claim. While lipophilicity certainly plays a role in a drugs effect (and thereby a drug combinations’ effect), I’m not sure that lipophilicty is a primary driver of a synergistic interaction. Rather, lipophilicity might be a prerequisite; that is, if two molecules cannot enter the cell to access their target(s), they’re unlikely to exhibit synergy!

The paper considered a set of 175 (anti-fungal) drug pairs tested in yeastand evaluated molecular weight, logP, H-bond donor and acceptor counts and also computed a synergicity, that is a measure of how frequently a drug exhibits synergy with other drugs. So the work isn’t really directly capturing synergy (which was measured using the Loewe model). They they compute Spearman correlation between the synergicity and the various physicochemical properties – identifying logP as the one with a statistically significant, though moderate correlation (though one of their examples presents a significant correlation of 0.2 – not a whole lot you could do with that!). They then go on to build a decision tree model that predicts synergicity surprisingly well, though given that the model is based on a synergy netowrk (nodes are drugs, edges are weighted by the synergy between a pair of drugs), it’s not clear how they evaluated the lipophilicity of a drug pair. The terminology was a bit confusing – sometimes using synergicity and sometimes synergy. It’s definitely a surprising result – but is it really meaningful? As I note above, I find it difficult to accept lipophilicty as a proximal driver of synergy. The fact that one of their analyses employs binned logP could raise an issue (see a presentation or this paper on the dangers of binned data).

241Given that NCATS has developed a high throughput compound combination screening platform, I was interested in seeing if any of this held up on some of our public datasets. I considered a dataset of 466 drugs tested in combination (6×6 matrix) with Ibrutinib. Thus, in contrast to the Yilancioglu et al paper, one member of the combinations is constant. As a result, it makes sense to correlate the logP of the other (i.e., non Ibrutinib) component of the combination to the synergy value of that combination. I evaluated logP of the compounds using ChemAxons cxcalc tool and compared the values to the various synergy metrics we calculate (see here for definitions). 241-binned

The figure above pretty much shows no correlation for any of the synergy metrics (and Spearmans ρ was ≅ 0 with p > 0.05). I also repeated the calculation of a set of 1912 combinations (i.e., 1912 compounds combined with Ibrutinib) and got essentially the same result. Granted that this was on a single lymphoma cell line (TMD8) which is significantly different from the environment considered by the authors and that our synergy metrics are different from those described in the paper. So, it might just be a feature of anti-fungal drugs?

But interestingly, when we considered binned logP values and look at the median value of a synergy metric in each logP bin, we do see a trend – at least for two out of four metrics. But given the scatter plots, where the variability is not hidden, is this really meaningful?

So overall, the paper presents some surprising observations but is a little unsatisfying from an explanatory point of view. And the conclusions don’t seem to translate to other datasets.