Why Academic Cheminformatics is Important

I’m in academia and I do cheminformatics. Recent collaborations, papers and funding issues in this field have made me think about the future of this research in this setting. This, and a thread discussing David Leahy’s talk on InkSpot Science at the Soton Open Science Workshop got me started on this post.

There are currently a number of groups and collaborations that are attempting to perform drug discovery without the large centralized infrastructure that is characteristic of this process. Examples of this include Jean Claude Bradley who runs the UsefulChem project and the Synaptic Leap as well as various academic labs. Also see Kozikowski et al

Cheminformatics plays a key role in drug discovery efforts at various stages. For example, identifying or prioritizing compounds from virtual libraries, predicting ADME profiles and side effects (e.g., hERG activation) and so on. I should stress that such computational methods don’t replace bench work – but they can certainly enhance it. More generally, we’re now faced with a deluge of data – and human eyeballs are not going to be able to handle this. And this is exactly the place that cheminformatics does it’s stuff.

Traditionally, cheminformatics research has been done in industry. There are of course various academic groups. Recently, a number of groups in the US were funded by the Molecular Libraries Initiative (MLI) to perform exploratory cheminformatics research. The funding period is over and unfortunately no plans are in place for continued funding of this research. So in general, academic cheminformatics is not as extensive as one would like it (at least in the US).

But why should cheminformatics get attention from academia? One could say that industry has done it all. But while there’s been a lot of impressive stuff coming out of industry in this field, a lot of it does not necessarily help the type of drug discovery efforts that are arising. In my view, there are three aspects that need to be considered

Data
Tools
Expertise

Data is becoming increasingly available in a public and resusable fashion (e.g., PubChem, ChemSpider and in the near future, EBI). Obviously, all the data available is not necessarily clean, accurate or useful. But it’s also important to remember that there are various groups that are producing high quality, focused data such as the PDSP. A recent news article in Nature highlights the need for cheminformatics to handle the flood of data being generated and stored.

The issues of tools and expertise are somewhat linked. Though one can always buy cheminformatics software, it is important to realize that drug discovery efforts as highlighted here may not be able to afford them or have the expertise to use them efficiently. Of course a number of commercial outfits provide low cost or free academic licenses. But there are other, more important issues than just cost and I won’t go into the merits of open source versus commercial cheminformatics software but see here and here.

The fact is that it is academic research that can freely put out the cutting edge tools and techniques – that can then be freely reused in academic and Open Source drug discovery. The key thing here is that the tools and techniques be freely accessible. This can be contentious at times, but as an academic myself, my view is that anything that I do will be made publicly available. I’ll admit that I’m not completely altruistic – I do want to publish my work, so there may be a limited-time embargo on the outcome of a project. But, I’m not in academia for the money. My currency is credit – and once I have that, the work is free to all. These are the tools that will allow other academic labs to participate in the drug discovery process. More importantly, it will allow academic labs to reproduce and validate the (relevant) steps of the drug discovery process.

It’s important to note that all academic cheminformatics may not be cutting edge – indeed a number of efforts aim to “redo” stuff that has been done in industry, in an open source fashion – toolkits, descriptors, pharmacophore searching. This is not glamorous (and in most cases, not fundable either) and most of it has to be done in spare time. But they do provide a foundation for the type of drug discovery we’re talking about. Neither am I saying that academic drug discovery efforts completely eschew commercial tools. In many cases, commercial tools do the job better than academic tools – and if so, I’d always suggest use the better tool if possible.

Finally, we need to consider expertise. There are a number of reasons that this aspect is important. Firstly, many academic tools, unfortunately, are not polished or easy to use and it is tough for a non-expert (and in some cases, anybody but the developer of the tool!) to be able to use it. But more importantly, cheminformatics in drug discovery is not yet a point and click venture – yes, we can now process gigabytes of data and get predictions for anything by pushing the buttons – but as with all endeavors, there are a lot of details that have to be taken into account, to be able to draw reliable conclusions. So while I’m all for having easy to use tools (such as Taverna and Knime) I think it’s important that experts in the field also play a role.

A quick example that caught my eye was the idea of auto-QSAR noted in the FF thread – I’ve developed such auto-QSAR methods in the past, and while useful for non-experts, it’s very easy for them to build models that are meaningless. I don’t know the underlying mechanism for the InkSpot version, so I can’t really comment – but it’s an example where easy to use tools could lead to problems. So cheminformatics research should be developing the tools in a robust manner – such that non-experts don’t get fooled (too much) by the results. I’ll admit that there’s only so much hand holding that can be programmed in – but there’s lots of research that can be done to help non-experts use cheminformatics methods robustly.

But the issue of expertise goes beyond just handling tools. There’s lots of data and many methods – some of which are useful, but some are also crap. I think academic cheminformatics can provide a lot of useful input regarding what to do with the data and how best to analyze it and convert it to something useful for bench chemists. And I think it also goes beyond suggesting what method to use. There’s a lot of insight that can be gained from computational methods and I think that such contributions can be invaluable. On a related note, the academic environment can be conducive to jumping across interdisciplinary barriers. Given that cheminformatics is interdisciplinary in the first place, it is natural for research in this field to address upcoming areas such as systems biology (hey, it’s all about molecules!). So it’s not just all about software.

Given the increasing ability for collaborative (linking data providers, tool developers and domain experts) drug discovery and the availability of public data and tools, it is imperative that academia supports cheminformatics research. Lack of support in academia for this field, will create a hole and, I believe, slow down and even hinder academic and Open Source drug discovery efforts.

To make a long post, a little longer, I’ll list some aspects of cheminformatics that I believe academia can and should address. I’ll probably write more detailed posts on these topics at a later time.

Machine accessible data
Reliable predictive models
Public, standardized benchmarks
Model exchange
Flexible accessibility

So much to do, so little time

Trying to squeeze sense out of chemical data

Leave a Reply Cancel reply