## Search Result for rest — 126 articles

## Surveying the Opinion of Chemists

As part of a project I was wondering about reports of surveys that collected chemists assessments of differnt things. More specifically, I wasn’t looking for crowd-sourcing efforts for data curation (such as the in the Spectral Game) or data collection. Rather, I was interested in reports where somebody asked a group of chemists what they thought of some particular molecular “feature”. Here, “feature” is pretty broadly defined and could range from the quality of a probe molecule to whether a molecule is complex or not.

Surveying the literature (and with pointers from @dgelemi, @baoilleach, Jun Li, @georgeisyourman and @DrBostrom) here’s the following papers:

- Brown et al, J. Med. Chem., 2015, “
*Understanding Our Love Affair with p-Chlorophenyl: Present Day Implications from Historical Biases of Reagent Selection*“ - Li et al, Org. Biomol. Chem, 2015, “
*Current complexity: a tool for assessing the complexity of organic molecules*“ - Franco et al, J. Cheminf., 2014, “
*The use of 2D fingerprint methods to support the assessment of structural similarity in orphan drug legislation*“ - Sheridan et al, JCIM, 2014, “
*Modeling a Crowdsourced Definition of Molecular Complexity*“ - Litterman et al, JCIM, 2014, “
*Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes*“ - Kutchukian et al, PLoS ONE, “Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery”
- Hack et al, JCIM, 2011, “
*Library Enhancement through the Wisdom of Crowds*“ - Oprea et al, Nat. Chem. Biol., 2009, “
*A crowdsourcing evaluation of the NIH chemical probes*“ - Ertl et al, J. Cheminf., 2009, “
*Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions*“ - Boda et al, JCAMD, 2007, “Structure and reaction based evaluation of synthetic accessibility”
- Lajiness et al, J. Med. Chem, 2004, “
*Assessment of the Consistency of Medicinal Chemists in Reviewing Sets of Compounds*“ - Takoaka et al, JCICS, 2003, “Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists’ Intuition”

The number of people surveyed across these studies ranges from less than 10 to more than 300. Recently there appears to be a trend towards developing predictive models based on the results of such surveys. Also, molecular complexity seems pretty popular. Modeling opinion is always a tricky thing, though in my mind some aspects (e.g., complexity, diversity) lend themselves to more robust models than others (e.g., quality of a probe).

If there are other examples of such surveys in chemistry, I’d appreciate any pointers

## Post-doc (Molecular Informatics) Opening at NCATS

I have a post-doc opening in the Informatics group at NCATS, to work on computational aspects of high throughput combination screening – topics will include predicting drug combination response, visualizing large combination screens (> 5000 combinations) and so on. The NCATS combination screening platform thas tested more than 65,000 compound combinations (in checkerboard style which means more than 4.5M individual dose combinations) along with single agent dose responses. You can view publicly released data at https://tripod.nih.gov/matrix-client.

The NCATS Informatics group is a collection of very smart people, with wide ranging interests in molecular informatics. We work closely with colleagues in biology and chemistry. As a result, we eat a lot of our own dog food. In addition, we’re committed to implementing our ideas in publicly available software tools as well as publishing in journals.

Lots of data, great people and tough problems. If this piques your interest visit the job posting for more details

## Ridit Analysis

While preparing material for an R workshop I ran at NCATS I was pointed to the topic of ridit analysis. Since I hadn’t heard of this technique before I decided to look into it and investigate how R could be used for such an analysis (and yes, there is a package for it).

## Why ridit analysis?

First, lets consider why one might consider ridit analysis. In many scenarios one might have data that is categorical, but the categories are ordered. This type of data is termed ordinal (sometimes also called “nominal with order”). An example might be a trial of an analgesics ability to reduce pain, whose outcome could be *no pain*, *some pain*, *extreme pain*. While there are three categories, it’s clear that there is an ordering to them. Analysis of such data usually makes use of methods devised for categorical data – but such methods will not make use of the information contained within the ordering of the categories. Alternatively, one might numerically code the groups using *1*, *2* and *3* and then apply methods devised for continuous or discrete variables. This is not appropriate since one can change the results by simply changing the category coding.

Ridit analysis essentially transforms ordinal data to a probability scale (one could call it a *virtual continuous scale*). The term actually stands for * relative to an identified distribution integral transformation* and is analoguous to probit or logit. (Importantly, ridit analysis is closely related to the Wilcoxon rank sum test. As shown by Selvin, the Wilcoxon test statistic and the mean ridit are directly related).

## Definitions

Essentially, one must have at least two groups, one of which is selected as the reference group. Then for the non-reference group, the mean ridit is an

estimate of the probability that a random individual from that group will have a value on the underlying (virtual) continuous scale greater than or equal to the value for a random individual from the reference group.

So if larger values of the underlying scale imply a worse condition, then the mean ridit is the probability estimate that the random individual from the group is worse of than a random individual from the reference group (based on the interpretation from Bross). Based on the definition of a ridit (see here or here), one can compute confidence intervals (CI) or test the hypothesis that different groups have equal mean ridits. Lets see how we can do that using R

## Mechanics of ridit analysis

Consider a dataset taken from Donaldson, (Eur. J. Pain, 1988) which looked at the effect of high and low levels of radiation treatment on trials participants’s sleep. The numbers are counts of patients:

1 2 3 4 5 6 7 8 9 10 11 12 | sleep <- data.frame(pain.level=factor(c('Slept all night with no pain', 'Slept all night with some pain', 'Woke with pain - medication provided relief', 'Woke with pain - medication provided no relief', 'Awake most or all of night with pain'), levels=c('Slept all night with no pain', 'Slept all night with some pain', 'Woke with pain - medication provided relief', 'Woke with pain - medication provided no relief', 'Awake most or all of night with pain')), low.dose=c(3, 10, 6, 2, 1), high.dose=c(6,10,2,0,0)) |

Here the groups are in the columns (*low.dose* and *high.dose*) and the categories are ordered such tat *Awake most or all of night with pain* is the “maximum” category. To compute the mean ridits for each dose group we first reorder the table and then convert the counts to proportions and then compute ridits for each category (i.e., row).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## reorder table sleep <- sleep[ length(levels(sleep$pain.level)):1, ] ## compute proportions sleep$low.dose.prop <- sleep$low.dose / sum(sleep$low.dose) sleep$high.dose.prop <- sleep$high.dose / sum(sleep$high.dose) ## compute riddit ridit <- function(props) { ## props should be in order of levels (highest to lowest) r <- rep(-1, length(props)) for (i in 1:length(props)) { if (i == length(props)) vals <- 0 else vals <- props[(i+1):length(props)] r[i] <- sum(vals) + 0.5*props[i] } return(r) } sleep$low.dose.ridit <- ridit(sleep$low.dose.prop) sleep$high.dose.ridit <- ridit(sleep$high.dose.prop) |

The resultant table is below

1 2 3 4 5 6 | pain.level low.dose high.dose low.dose.prop high.dose.prop low.dose.ridit high.dose.ridit 5 Awake most or all of night with pain 1 0 0.04545455 0.0000000 0.97727273 1.0000000 4 Woke with pain - medication provided no relief 2 0 0.09090909 0.0000000 0.90909091 1.0000000 3 Woke with pain - medication provided relief 6 2 0.27272727 0.1111111 0.72727273 0.9444444 2 Slept all night with some pain 10 10 0.45454545 0.5555556 0.36363636 0.6111111 1 Slept all night with no pain 3 6 0.13636364 0.3333333 0.06818182 0.1666667 |

The last two columns represent the ridit values for each category and can be interpreted as

a probability estimate that an individuals value on the underlying continuous scale is less than or equal to the midpoint of the corresponding interval

The next step (and main point of the analysis) is to compute the mean ridit for a group (essentially the sum of the category proportions for that group weighted by the category ridits in the reference group) , based on a reference. In this case, lets assume the low dose group is the reference.

1 | mean.r.high <- sum(with(sleep, high.dose.prop * low.dose.ridit)) |

which is 0.305, and can be interpreted as the probability that a patient receiving the high dose of radiation will experience more sleep interference than a patient in the low dose group. Importantly, since ridits are estimates of probabilities, the complementary ridit (i.e., using the high dose group as reference) comes out to 0.694 and is the probability that a patient in the low radiation dose group will experience more sleep interferance than a patient in the high dose group.

## Statistics on ridits

There are a number of ways to compute CI’s on mean ridits or else test the hypothesis that the mean ridits differ between \(k\) groups. Donaldsons method for CI calculation appears to be restricted to two groups. In contrast, Fleiss et al suggest an alternate method based on. Considering the latter, the CI for a group vs the reference group is given by

\(\overline{r}_i \pm B \frac{\sqrt{n_s +n_i}}{2\sqrt{3 n_s n_i}}\)

where \(\overline{r}_i\) is the mean ridit for the \(i\)’th group, \(n_s\) and \(n_i\) are the sizes of the reference and query groups, respectively and \(B\) is the multiple testing corrected standard error. If one uses the Bonferroni correction, it would be \(1.96 \times 1\) since there is only two groups being compared (and so 1 comparison). Thus the CI for the mean ridit for the low dose group, using the high dose as reference is given by

\(0.694 \pm 1.96 \frac{\sqrt{18 + 22}}{2\sqrt{3 \times 18 \times 22}}\)

which is 0.515 to 0.873. Given that the interval does not include 0.5, we can conclude that there is a statistically significant difference (\(\alpha = 0.05\)) in the mean ridits between the two groups. For the case of multiple groups, the CI for any group vs any other group (i.e., not considering the reference group) is given by

\( (\overline{r}_i – \overline{r}_j + 0.5) \pm B \frac{\sqrt{n_i + n_j}}{2\sqrt{3 n_i n_j}}\)

Fleiss et al also describes how one can test the hypothesis that the mean ridits across all groups (including the reference) are equal using a \(\chi^2\) statistic. In addition, they also describe how one can perform the same test between any group and the reference group.

## R Implementation

I’ve implemented a function that computes mean ridits and their 95% confidence interval (which can be changed). It expects that the data is provided as counts for each category and that the input data.frame is ordered in descending order of the categories. You need to specify the variable representing the categories and the reference variable. As an example of its usage, we use the dataset from Fleiss et al which measured the degree of pain relief provided by different drugs after oral surgery. We perfrom a ridit analysis using aspirin as the reference group:

1 2 3 4 5 6 7 8 | dental <- data.frame(pain.relief = factor(c('Very good', 'Good', 'Fair', 'Poor', 'None'), levels=c('Very good', 'Good', 'Fair', 'Poor', 'None')), ibuprofen.low = c(61, 17, 10, 6, 0), ibuprofen.high = c(52, 25, 5, 3, 1), Placebo = c(32, 37, 10, 18, 0), Aspirin = c(47, 25, 11, 4, 1) ) ridit(dental, 'pain.relief', 'Aspirin') |

which gives us

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | $category.ridit pain.relief ibuprofen.low ibuprofen.high Placebo 1 Very good 0.67553191 0.697674419 0.83505155 2 Good 0.26063830 0.250000000 0.47938144 3 Fair 0.11702128 0.075581395 0.23711340 4 Poor 0.03191489 0.029069767 0.09278351 5 None 0.00000000 0.005813953 0.00000000 $mean.ridit ibuprofen.low ibuprofen.high Placebo 0.5490812 0.5455206 0.3839620 $ci group low high 1 ibuprofen.low 0.4361128 0.6620496 2 ibuprofen.high 0.4300396 0.6610016 3 Placebo 0.2718415 0.4960826 |

The results suggest that patients receiving either dose of ibuprofen will get better pain relief compared to aspirin. However, if you consider the CI’s it’s clear that they both contain 0.5 and thus there is no statistical difference in the mean ridits for these two doses, compared to aspirin. On the other hand, placebo definitely leads to less pain relief compared to aspirin.

## Summarizing Collections of Curves

I was browsing live notes from the recent IEEE conference on visualization and came across a paper about functional boxplots. The idea is an extension of the boxplot visualization (shown alongside), to a set of functions. Intuitively, one can think of a functional box plot as specific envelopes for a set of functions. The construction of this plot is based on the notion of *band depth* (see the more general concept of data depth) which is a measure of how far a given function is from the collection of functions. As described in Sun & Genton the band depth for a given function can be computed by randomly selecting \(J\) functions and identifying wether the given function is contained within the minimum and maximum of the \(J \) functions. Repeating this multiple times, the fraction of times that the given function is fully contained within the \(J\) random functions gives the band depth, \(BD_j\). This is then used to order the functions, allowing one to compute a 50% band, analogous to the IQR in a traditional boxplot. There are more details (choice of \(J\), partial bounding, etc.) described in the papers and links above.

My interest in this approach was piqued since one way of summarizing a dose response screen, or comparing dose response data across multiple conditions is to generate a box plot of a single curve fit parameter – say, \(\log IC_{50} \). But if we wanted to consider the curves themselves, we have a few options. We could simply plot all of them, using translucency to avoid a blob. But this doesn’t scale visually. Another option, on the left, is to draw a series of box plots, one for each dose, and then optionally join the median of each boxplot giving a “median curve”. While these vary in their degree of utility, the idea of summarizing the *distribution* of a set of curves, and being able to compare these distributions is attractive. Functional box plots look like a way to do this. (A cool thing about functional boxplots is that they can be extended to multivariate functions such as surfaces and so on. See Mirzargar et al for examples)

Computing \(BD_j\) can be time consuming if the number of curves is large or \(J\) is large. Lopez-Pintado & Jornsten suggest a simple optimization to speed up this step, and for the special case of \(J = 2\), Sun et al proposed a ranking based procedure that scales to thousands of curves. The latter is implemented in the fda package for R which also generates the final functional box plots.

As an example I considered 6 cell proliferation assays run in dose response, each one running the same set of compounds, but under different growth conditions. For each assay I only considered good quality curves (giving from 349 to 602 curves). The first plot compares the actives identified in the different growth conditions using the \(\log IC_{50}\), and indicates a statistically significant increase in potency in the last three conditions compared to the first three.

In contrast, the functional box plots for the 6 assays, suggest a somewhat different picture (% Response = 100 corresponds to no cell kill and 0 corresponds to full cell kill).

The red dashed curves correspond to outliers and the blue lines correspond to the ‘maximum’ and ‘minimum’ curves (analogous to the whiskers of the traditional boxplot). Importantly, these are not measured curves, but instead correspond to the dose-wise maximum (and minimum) of the real curves. The pink region represents 50% of the curves and the black line represents the (virtual) median curve. In each case the X-axis corresponds to dose (unlabeled to save space). Personally, I think this visualization is a little cleaner than the dose-wise box plot shown above.

The mess of red lines in the plot **1** suggest an issue with the assay itself. While the other plots do show differences, it’s not clear what one can conclude from this. For example, in the plot for **4**, the dip on the left hand side (i.e., low dose) could suggest that there is a degree of cytotoxicity, which is comparatively less in **3**, **5** and **6**. Interestingly none of the median curves are really sigmoidal, suggesting that the distribution of dose responses has substantial variance.

## Thoughts on the DREAM Synergy Prediction Challenge

The DREAM consortium has run a number of predictive modeling challenges and the latest one on predicting small molecule synergies has just been published. The dataset that was provided included baseline gene expression of the cell line (OCI-LY3), expression in presence of compound (2 concentrations, 2 time points), dose response data for 14 compounds and the excess over Bliss for the 91 pairs formed from the 14 compounds. Based on this data (and available literature data) participants had to predict a ranking for the 91 combinations.

The paper reports the results of 31 approaches (plus one method that was not compared to the others) and does a good job of summarizing their performance and identifying whether certain data type or certain approaches work better than others. They also investigated the performance of an ensemble of approaches, which, as one might expect, worked better than the single methods. While the importance of gene expression in predictive performance was not as great as I would’ve thought, it was certainly more useful than chemical structure alone. Interestingly, they also noted that “*compounds with more targeted mechanisms, such as rapamycin and blebbistatin, were least synergistic*“. I suspect that this is somewhat dataset specific, but it will be interesting to see whether this holds in large collections of combination experiment such as those run at NCATS.

Overall, it’s an important contribution with the key take home message being

… synergy and antagonism are highly context specific and are thus not universal properties of the compounds’ chemical, structural or substrate information. As a result, predictive methods that account for the genetics and regulatory architecture of the context will become increasingly relevant to generalize results across multiple contexts

Given the relative dearth of predictive models of compound synergy, this paper is a nice compilation of methods. But there are some issues that weaken the paper.

- One key issue are the conclusions on model performance. The organizers defined a score, termed probabilistic c-score (PC score). If I understand correctly, a random ranking should give PC = 0.5. It turns out that the best performing method exhibited a PC score = 0.61 with a number of methods hovering around 0.5. Undoubtably, this is a tough problem, but when the authors states that “
*… this challenge shows that current methodologies can perform significantly better than chance …*” I raise an eyebrow. I can only assume that what they meant was that the results were “*statistically significantly better than chance*“, because in terms of effect size the results are not impressive. After reading this excellent article on p-values and significance testing I’m particularly sensitized to claims of significance. - The dataset could have been strengthened by the inclusion of self-crosses. This would’ve allowed the authors to assess actual excess over Bliss values corresponding to additivity (which will not be exactly 0 due to experimental noise), and avoid the use of cutoffs in determining what is synergistic or antagonistic.
- Similarly, a key piece of data that would really strengthen these approaches is the expression data in presence of combinations. While it’s unreasonable to have this data available for all combinations, it could be used as a first step in developing models to predict the expression profile in presence of combination treatment. Certainly, such data could be used to validate some assumptions made by some of the models described (e.g., concordance of DEG’s induced by single agents implies synergistic response).
- Kudos for including source code for the top methods, but would’ve been nicer if data files were included so we could actually reproduce the results.
- The authors conclude that when designing new synergy experiments, one should identify mechanistically diverse molecules to make up for the “
*small number of potentially synergistic pathways*“. While mechanistic diversity is a good idea, it’s not clear how they conclude there are a small number of pathways that play a role in synergy. - It’s a pity that the SynGen method was not compared to the other methods. While the authors provide a justification, it seems rather weak. The method only applied to the synergistic combinations (performance was not a whole lot better than random – true positive rate of 56%) – but the text indicates that it predicted synergistic compound pairs. It’s not clear whether this means it made a call on synergy or a predicted ranking. If the latter it would’ve been interesting to see how it compared to the rankings of the synergistic subset of 91 compounds from other methods.