Deep Learning in Chemistry

Deep learning (DL) is all the rage these days and this approach to predictive modeling is being applied to a wide variety of problems, including many in computational drug discovery. As a dilettante in the area of deep learning, I’ve been following papers that have used DL for cheminformatics problems, and thought I’d mention a few that seemed interesting.

An obvious outcome of a DL model is more accurate predictions, and as a result most applications of DL in drug discovery have focused on the use of DL models as more accurate regression or classification models. Examples include Lusci et al [2013], Xu et al [2015] and Ma et al [2015]. It’s interesting to note that in these papers, while DL models show better performance, it’s not consistent and the actual increase in performance is not necessarily very large (for the effort required). Eakins [2016] has reviewed the use of DL models in QSAR settings and more recently Winkler & Le [2016] have also briefly reviewed this area.

However, simply replacing one regression method with another is not particularly interesting. Indeed, as pointed by several workers (e.g., Shao et al [2013]) input descriptors, rather than modeling method, have greater effect on predictive accuracy. And so it’s the topic of representation learning that I think DL methods become interesting and useful in the area of cheminformatics.

Several groups have published work on using DL methods to learn a representation of the molecular structure, directly from the graph representation. Duvenaud et al [2016] and Kearnes et al [2016] both have described these approaches and the nice thing is that this alleviates the need to choose and select features a priori. The downside is that the learned features are optimal in the context of the training data (thus necessitating large training sets to allow for learned features that are generalizable). Interestingly, on reading Kearnes et al [2016], the features that are learned by the DL model are conceptually similar to circular fingerprints. More interestingly, when they built predictive neural network models using the learned representation, the RMSE was not significantly different from a random forest model using circular fingerprints. Of course, the learned representation is driven by the architecture of the DL model, which was designed to look at atom neighborhoods, so it’s probably not too surprising that the optimal representations was essentially equivalent to a circular fingerprint. But one can expect that tweaking the DL architecture and going beyond the molecular graph could lead to more useful representations. Also, this paper very clearly describes the hows and whys of designing a deep neural network architecture, and is useful for someone interested in exploring further.

Another interesting development is the use of DL to learn a continuous representation of a molecular structure, that can then be modified (usually in a manner to vary some molecular property) and “decoded” to obtain a new chemical structure with the desired molecular property. This falls into the class of inverse QSAR problems and Gomez-Bombarelli et al [2016] present a nice example of this approach, where gradient descent is used to explore chemical space defined by the learned continuous representation. Unfortunately the chemistry represented by the generated structures has several problems as described by Derek Lowe. While this problem has been addressed before (e.g., Wong et al [2009] with SVM, Miyao et al [2016], Skvortsova et al [1993]), these efforts have started with pre-defined feature sets. The current works key contribution is the ability to generate a continuous chemical space and I assume the nonsensical regions of the space could be avoided using appropriate filters.

Winkler & Le [2016] recently reported a comparison of deep and shallow neural networks for QSAR regression. Their results and conclusions are similar to previous work. But more tantalizingly, they make the claim that DNN’s may be better suited to tackle the prediction of activity cliffs. There has been some work on this topic (Guha [2012] and Heikamp et al [2012]) but given that activity cliffs are essentially discontinuities in a SAR surface (either fundamentally or by choice of descriptors), traditional predictive models are unlikely to do well. Winkler & Le point to work that suggests that activity cliffs may “disappear” if an appropriately high dimensionality descriptor space is used, and conclude that learned representations via DL may be useful for this. Though I don’t discount this, I’m not convinced that simply moving to higher dimensional spaces is sufficient (or even necessary) – if it were, SVM‘s should be good at predicting activity cliffs. Rather, it’s the correct set of features, that captures the phenomenon underlying the cliff, that are necessary. Nonetheless, Winkler & Le [2016] raise some interesting questions regarding the smoothness of chemical spaces.

4 thoughts on “Deep Learning in Chemistry

  1. Marwin says:

    Nice post, I did not know several of those papers!
    Furthermore, I would like to point you to the review “Deep Learning in Drug Discovery” by Gawehn et al. in Mol. Inf. (2015) http://onlinelibrary.wiley.com/doi/10.1002/minf.201501008/abstract

    Regarding the activity cliffs: Aren’t they an artefact of our inductive, symbolic molecular representations? What looks to us humans as similar Lewis formulae almost always has similar physicochemical properties, but sometimes for certain receptor-ligand combinations a small change can dramatically change the energetics, which we simply cannot capture with our inductive models. E.g. in most cases, adding a methyl group does not have a big effect, but sometimes it behaves “magically”. A deductive (ab-initio) model would not have activity cliffs, but that comes at the prize of possibly infeasible computational cost.

    Neural networks could in principal help to learn feature representations to address this issue, but maybe our datasets in chemistry are still too small for that. One can certainly do a lot of fascinating experiments with them that were not possible with the conventional supervised models, so maybe down the road they will lead to some interesting tools we cannot foresee yet! I think it’s definitely worth to study them.

  2. Thanks for the link – I had seen it (and others) but didn’t want to overload a blog post.

    You’re right about activity cliffs. Our inability to predict them is undoubtably due to incomplete representations. And while a true ab initio model (of ligand+protein) would explain these, I think simplified representations will also explain them (with less accuracy of course). The bottleneck is that you need ligand+protein. And I don’t think a DL approach can do much better if it doesn’t take into account ligand+protein.

  3. Dave Winkler says:

    Thanks for the interesting review of DL paper in cheminformatics. I would like to point out that one of the most important parts of our paper comparing the performance of deep and shallow neural networks was the Universal Approximation Theorem, which puts important theoretical constraints on DL being dramatically better than a single layer neural net. It would be good to see more discussion on this point.

  4. zjw says:

    Great job! May i translate to Chinese?

Leave a Reply

Your email address will not be published. Required fields are marked *