# So much to do, so little time

Trying to squeeze sense out of chemical data

## Maximally Bridging Rings (or, Doing What the Authors Should’ve Done)

with one comment

Recently I came across a paper from Marth et al that described a method based on network analysis to support retrosynthetic planning, particularly for complex natural products. I’m no synthetic chemist so I can’t comment on the relevance or importance of the targets or the significance of the proposed approach to planning a synthetic route. What caught my eye was the claim that

This work validates the utility of network analysis as a starting point for identifying strategies for the syntheses of architecturally complex secondary metabolites.

I was a little disappointed (hey, a Nature publication sets certain expectations!) that the network analysis was fundamentally walking the molecular graph to identify a certain type of ring, termed the maximally bridging ring. The algorithm is described in the SI and the authors make it available
as an online tool. Unfortunately they didn’t provide any source code for their algorithm, which was a bit irritating, given that the algorithm is a key component of the paper.

I put together an implementation using the CDK (1.5.12), available in a Github repo. It’s a quick hack, using the parameters specified in the paper, and hasn’t been extensively tested. However it seems to give the correct result for the first few test cases in the SI.

The tool will print out the hash code of the rings recognized as maximally bridging and also generate an SVG depiction with the first such ring highlighted in red, such as shown alongside. You can build a self-contained version of the tool as

 123 git clone git@github.com:rajarshi/maxbridgerings.git cd maxbridgerings mvn clean package

The tool can then be run (with the depiction output to Copaene.svg)

 12 java -jar target/MaximallyBridgingRings-1.0-jar-with-dependencies.jar \   "CC(C)C1CCC2(C3C1C2CC=C3C)C" Copaene

Written by Rajarshi Guha

December 24th, 2015 at 4:10 am

## Surveying the Opinion of Chemists

As part of a project I was wondering about reports of surveys that collected chemists assessments of differnt things. More specifically, I wasn’t looking for crowd-sourcing efforts for data curation (such as the in the Spectral Game) or data collection. Rather, I was interested in reports where somebody asked a group of chemists what they thought of some particular molecular “feature”. Here, “feature” is pretty broadly defined and could range from the quality of a probe molecule to whether a molecule is complex or not.

Surveying the literature (and with pointers from @dgelemi, @baoilleach, Jun Li, @georgeisyourman and @DrBostrom) here’s the following papers:

The number of people surveyed across these studies ranges from less than 10 to more than 300. Recently there appears to be a trend towards developing predictive models based on the results of such surveys. Also, molecular complexity seems pretty popular. Modeling opinion is always a tricky thing, though in my mind some aspects (e.g., complexity, diversity) lend themselves to more robust models than others (e.g., quality of a probe).

If there are other examples of such surveys in chemistry, I’d appreciate any pointers

Written by Rajarshi Guha

November 7th, 2015 at 1:21 pm

## Post-doc (Molecular Informatics) Opening at NCATS

I have a post-doc opening in the Informatics group at NCATS, to work on computational aspects of high throughput combination screening – topics will include predicting drug combination response, visualizing large combination screens (> 5000 combinations) and so on. The NCATS combination screening platform thas tested more than 65,000 compound combinations (in checkerboard style which means more than 4.5M individual dose combinations) along with single agent dose responses. You can view publicly released data at https://tripod.nih.gov/matrix-client.

The NCATS Informatics group is a collection of very smart people, with wide ranging interests in molecular informatics. We work closely with colleagues in biology and chemistry. As a result, we eat a lot of our own dog food. In addition, we’re committed to implementing our ideas in publicly available software tools as well as publishing in journals.

Lots of data, great people and tough problems. If this piques your interest visit the job posting for more details﻿

Written by Rajarshi Guha

February 6th, 2015 at 8:52 pm

## Applications Invited for CSA Trust Grant for 2015

The Chemical Structure Association (CSA) Trust is an internationally recognized organization established to promote the critical importance of chemical information to advances in chemical research. In support of its charter, the Trust has created a unique Grant Program and is now inviting the submission of grant applications for 2015.

Purpose of the Grants
The Grant Program has been created to provide funding for the career development of young researchers who have demonstrated excellence in their education, research or development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and compounds. One or more Grants will be awarded annually up to a total combined maximum of ten thousand U.S. dollars (\$10,000). Grants are awarded for specific purposes, and within one year each grantee is required to submit a brief written report detailing how the grant funds were allocated. Grantees are also requested to recognize the support of the Trust in any paper or presentation that is given as a result of that support.

Who is Eligible?
Applicant(s), age 35 or younger, who have demonstrated excellence in their chemical information related research and who are developing careers that have the potential to have a positive impact on the utility of chemical information relevant to chemical structures, reactions and compounds, are invited to submit applications. While the primary focus of the Grant Program is the career development of young researchers, additional bursaries may be made available at the discretion of the Trust. All requests must follow the application procedures noted below and will be weighed against the same criteria.

Which Activities are Eligible?
Grants may be awarded to acquire the experience and education necessary to support research activities; e.g. for travel to collaborate with research groups, to attend a conference relevant to one’s area of research, to gain access to special computational facilities, or to acquire unique research techniques in support of one’s research.

Application Requirements:
Applications must include the following documentation:

1. A letter that details the work upon which the Grant application is to be evaluated as well as details on research recently completed by the applicant;
2. The amount of Grant funds being requested and the details regarding the purpose for which the Grant will be used (e.g. cost of equipment, travel expenses if the request is for financial support of meeting attendance, etc.). The relevance of the above-stated purpose to the Trust’s objectives and the clarity of this statement are essential in the evaluation of the application);
3. A brief biographical sketch, including a statement of academic qualifications;
4. Two reference letters in support of the application. Additional materials may be supplied at the discretion of the applicant only if relevant to the application and if such materials provide information not already included in items 1-4. Three copies of the complete application document must be supplied for distribution to the Grants Committee.

Applications for the 2015 Grant is March 13, 2015. Successful applicants will be notified no later than May 2nd of the relevant year.

The application documentation should be forwarded to: Bonnie Lawlor, CSA Trust Grant Committee Chair, 276 Upper Gulph Road, Radnor, PA 19087, USA. If you wish to enter your application by e-mail, please contact Bonnie Lawlor at chescot@aol.com prior to submission so that she can contact you if the e-mail does not arrive.

Written by Rajarshi Guha

February 2nd, 2015 at 5:20 pm

Posted in cheminformatics

Tagged with , ,

## Thoughts on the DREAM Synergy Prediction Challenge

The DREAM consortium has run a number of predictive modeling challenges and the latest one on predicting small molecule synergies has just been published. The dataset that was provided included baseline gene expression of the cell line (OCI-LY3), expression in presence of compound (2 concentrations, 2 time points), dose response data for 14 compounds and the excess over Bliss for the 91 pairs formed from the 14 compounds. Based on this data (and available literature data) participants had to predict a ranking for the 91 combinations.

The paper reports the results of 31 approaches (plus one method that was not compared to the others) and does a good job of summarizing their performance and identifying whether certain data type or certain approaches work better than others. They also investigated the performance of an ensemble of approaches, which, as one might expect, worked better than the single methods. While the importance of gene expression in predictive performance was not as great as I would’ve thought, it was certainly more useful than chemical structure alone. Interestingly, they also noted that “compounds with more targeted mechanisms, such as rapamycin and blebbistatin, were least synergistic“. I suspect that this is somewhat dataset specific, but it will be interesting to see whether this holds in large collections of combination experiment such as those run at NCATS.

Overall, it’s an important contribution with the key take home message being

… synergy and antagonism are highly context specific and are thus not universal properties of the compounds’ chemical, structural or substrate information. As a result, predictive methods that account for the genetics and regulatory architecture of the context will become increasingly relevant to generalize results across multiple contexts

Given the relative dearth of predictive models of compound synergy, this paper is a nice compilation of methods. But there are some issues that weaken the paper.

• One key issue are the conclusions on model performance. The organizers defined a score, termed probabilistic c-score (PC score). If I understand correctly, a random ranking should give PC = 0.5. It turns out that the best performing method exhibited a PC score = 0.61 with a number of methods hovering around 0.5. Undoubtably, this is a tough problem, but when the authors states that “… this challenge shows that current methodologies can perform significantly better than chance …” I raise an eyebrow. I can only assume that what they meant was that the results were “statistically significantly better than chance“, because in terms of effect size the results are not impressive. After reading this excellent article on p-values and significance testing I’m particularly sensitized to claims of significance.
• The dataset could have been strengthened by the inclusion of self-crosses. This would’ve allowed the authors to assess actual excess over Bliss values corresponding to additivity (which will not be exactly 0 due to experimental noise), and avoid the use of cutoffs in determining what is synergistic or antagonistic.
• Similarly, a key piece of data that would really strengthen these approaches is the expression data in presence of combinations. While it’s unreasonable to have this data available for all combinations, it could be used as a first step in developing models to predict the expression profile in presence of combination treatment. Certainly, such data could be used to validate some assumptions made by some of the models described (e.g., concordance of DEG’s induced by single agents implies synergistic response).
• Kudos for including source code for the top methods, but would’ve been nicer if data files were included so we could actually reproduce the results.
• The authors conclude that when designing new synergy experiments, one should identify mechanistically diverse molecules to make up for the “small number of potentially synergistic pathways“. While mechanistic diversity is a good idea, it’s not clear how they conclude there are a small number of pathways that play a role in synergy.
• It’s a pity that the SynGen method was not compared to the other methods. While the authors provide a justification, it seems rather weak. The method only applied to the synergistic combinations (performance was not a whole lot better than random – true positive rate of 56%) – but the text indicates that it predicted synergistic compound pairs. It’s not clear whether this means it made a call on synergy or a predicted ranking. If the latter it would’ve been interesting to see how it compared to the rankings of the synergistic subset of 91 compounds from other methods.

Written by Rajarshi Guha

November 20th, 2014 at 5:37 pm