Archive for April, 2012
A few days back, Derek Lowe posted a comment from a reader who suggested a way to approach the current employment challenges in the pharmaceutical industry would be the formation of a Federation of Independent Scientists. Such a federation would be open to consultants, small companies etc and would use its size to obtain group rates on various things – journal access, health insurance and so on. Obviously, there’s a lot of details left out here and when you go in the nitty gritty a lot of issues arise that don’t have simple answers. Nevertheless, an interesting (and welcome, as evidenced by the comment thread) idea.
One aspect raised by a commenter was access to modeling and docking software by such a group. He mentioned that he’d
… like to see an open source initiative develop a free, open source drug discovery package.Why not, all the underlying force fields and QM models have been published … it would just take a team of dedicated programmers and computational chemists time and passion to create it.
This is the very essence of the Blue Obelisk movement, under whose umbrella there is now a wide variety of computation chemistry and cheminformatics software. There’s certainly no lack of passion in the Open Source chemistry software community. As most of it is based on volunteer effort, time is always an issue. This has a direct effect on the features provided by Open Source chemistry software – such software does not always match up to commercial tools. But as the commenter above pointed out, much of the algorithms underlying proprietrary software is published. It just needs somebody with the time and expertise to implement them. And the combination of these two (in the absence of funding) is not always easy to find.
Of course, having access to the software is just one step. A scientists requires (possibly significant) hardware resources to run the software. Another comment raised this issue and asked about the possibility of a cloud based install of comp chem software.
With regards the sophisticated modelling tools – do they have to be locally installed?
How do the big pharma companies deploy the software now? I would be very suprised if it wasn’t easily packaged, although I guess the number of people using it is limited.
I’m thinking of some kind of virtual server, or remote desktop style operation. Your individual contractor can connect from whereever, and have full access to a range of tools, then transfer their data back to their own location for safekeeping.
Unlike CloudBioLinux, which provides a collection of bioinformatics and structural biology software as a prepackaged AMI for Amazons EC2 platform, I’m not aware of a similarly prepackaged set of Open Source tools for chemistry. And certainly not based on the cloud. (There are some companies that host comp chem software on the cloud and provide access to these installations for a fee). While some Linux distribibutions do package a number of scientific packages (UbuntuScience for example), I don’t think that these would support a computational drug discovery operation. (The above comment does’nt necessarily focus just on Open Source software. One could consider commercial software hosted on remote servers, though I wonder what type of licensing would be involved).
The last component would be the issue of data, primarily for cloud based solutions. While compute cycles on such platforms are usually cheap, bandwidth can be expensive. Granted, chemical data is not as big as biological data (cf. 1000Genomes on AWS), but sending a large collection of conformers over the network may not be very cost-effective. One way to bypass this would be to generate “standard” conformer collections and other such libraries and host them on the cloud. But what is “standard” and who would pay for hosting costs is an open question.
But I do think there is a sufficiently rich ecosystem of Open Source software that could serve much of the computational needs of a “Federation of Independent Scientists”. It’d be interesting to put together a list of Open Source based on requirements from the the commenters in that thread.
Gamification is a hot topic and companies such as Tunedit and Kaggle are succesfully hosting a variety of data mining competitions. These competitions employ data from a variety of domains such as bond trading, essay scoring and so on. Recently, both platforms have hosted a QSAR challenge (though not officially denoted as such). The most recent one is the challenge hosted at Kaggle by Boehringer Ingelheim.
While it’s good to see these competitions raise the profile of “data science” (and make some money for the winners), I must admit that these are not particularly interesting to me as it really boils down to looking at numbers with no context (aka domain knowledge). For example, in the Kaggle & BI example, there are 1,776 descriptors that have been normalized but no indication of the chemistry or biology. One could ask whether a certain mechanism of action is known to play a role in the biology being tested which could suggest a certain class of descriptors over another. Alternatively, one could ask whether there are a few distinct chemotypes present thus suggesting multiple local models versus a single global model. (I suppose that the supplied descriptors may lend themselves to a clustering, but a scaffold based approach would be much more direct and chemically intuitive).
This is not to say that such competitions are useless. On the contrary, lack of domain knowledge doesn’t preclude one from apply sophisticated statistical and machine learning methods to unannotated data and obtaining impressive results. The issue of data versus domain knowledge has been discussed in several places.
In contrast to the currently hosted challenge at Kaggle, an interesting twist would be to try and reverse engineer the structures from their descriptor values. There have been some previous discussions on reverse engineering structures from descriptor data. Obviously, we’re not going to be able to verify our results, but it would be an interesting challenge.
Another ACS National meeting is over, this time in San Diego. It was good to catch up with old friends and meet many new, interesting people. As I was there for a relatively short period, I bounced around most sessions.
MEDI and COMP had a joint session on desktop modeling and its utility in medicinal chemistry. Anthony Nicholls gave an excellent talk, where he differentiated between “strong signals” and “weak signals”, the former being extremely obvious trends, features or facts that do not require a high degree of specialized exerptise to detect and the latter being those that do require significantly more expertise to identify. An example of a strong signal would be an empty region of a binding pocket that is not occupied by a ligand feature – it’s pretty easy to spot this and when hihglighted the possible actions are also obvious. A weak signal could be a pi-stacking interaction which could be difficult to identify in a crowded 3D diagram. He then highlighted how simple modifications to traditional 2D depictions can be used to make the obvious more obvious and make features that might be subtle, say in 3D, more obvious in a 2D depiction. Overall, an elegant talk, that focused on how simple visual cues in 2D & pseudo-3D depictions can key the mind to focus on important elements.
There were two other symposia that were of particular interest. On Sunday Shuxing Zhang and Sean Eakins organized a symposium on polypharmacology with an excellent line up of speakers including Chris Lipinski. Curt Breneman gave a nice talk that highlighted best practices in QSAR modeling and Marti Head gave a great talk on the role and value of docking in computational modeling projects.
On Tuesday, Jan Kuras and Tudor Oprea organized a session on System Chemical Biology. Though the session appeared to be more on the lines of drug repurposing, there were several interesting talks. Ebelebola May from Sandia Labs gave a very interesting talk on a system level model of small molecule inhibition of M. Tuberculosis and F. Tularensis - combining metabolic pathway models and cheminformatics.
John Overington gave a very interesting talk on identifying drug combinations to improve safety. Contrary to much of my reading in this area, he points out the value of “me-too” drugs and taking combinations of such drugs. Given that such drugs hit the same target, he pointed out that this results in the fact that off-targets will see reduced concentrations of the individual drugs (hopefully reducing side effects) while the on-target will see the pooled concentration (thus maintaining efficacy (?)). It’s definitely a contrasting view to the one where we identify combinations of drugs hitting different targets (which I’d guess is a tougher proposition, since identifying a truly synergistic combination requires a detailed knowledge of the underlying pathways and interactions). He also pointed out that his analyses indicated that combination dosing is not actually reduced, in contrast to the current dogma.
As before we had a CINFlash session which I think went quite well – 8 diverse speakers with a pretty good audience. The slides of the talks have been made available and we plan to have another session in Philadelphia this Fall, so consider submitting something. We also had a great Scholarships for Scientific Excellence poster session – 15 posters covering topics ranging from reaction prediction to an analysis of retractions. Excellent work, and very encouraging to see newcomers to CINF interested in getting more invovled.
The only downsides to the meeting was the chilly and unsunny weather and the fact that people still think that displaying tables of numbers in a slide actually transmits any information!