So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for August, 2010

Back from Boston

with 2 comments

Another ACS National Meeting, this time in Boston, is over and I’m finally home. I gave two talks, one on issues surrounding the data deluge in modern drug discovery and another one on structure activity landscapes. There were a number of great sessions in CINF, COMP and MEDI, with some thought-provoking talks. I especially liked a talk given by Birte Seebeck, in which they abstracted the idea of SALI (which focuses on structural features of ligands) to one that considers interactions betwen a ligand and a receptor – thus identifying activity hotspots within a protein site that actually cause the activity cliffs. The idea is somewhat similar to SiFT’s, but differs in that it takes into account the SAR. (As a side note, I discovered that one of our landscape papers is in the Top 25 in Drug Discov. Today). Gerry Maggiora gave a very thought provoking talk on the topic of activity cliffs,  highlighting the fact that there’s a lot of open questions that need to be looked at in this area. Ant Nicholls of spoke on the disconnect between molecular modeling in academia industry. His three suggestions: rigorous statistics training, stop government funding for all but basic research and all remaining funding must have an experimental component.

I also met up with a number of old friends, met some people with whom I’d only had email or FriendFeed conversations and made new friends. We had a Blue Obelisk dinner, with Christoph awarding a Blue Obelisk to Nina Jeliazkova. This time round, we got a whole restaraunt to ourselves, thanks to Christoph, so conversations was much easier! Also the financial contribution towards the dinner from Bob Belford and Harry Pence was very much appreciated.

At this meeting I finally got round to making use of Twitter – it turns out it was quite useful for keeping running notes during a talk, as well as keeping track of other parallel sessions. Thanks to Egon for those extra tweets (though maybe Egon and I were being a bit obssesive!?). A quick hack I put together just before the meeting allowed Tweeters to visualize the Twitter stream emanating from the ACS meeting as a word cloud. Obviously, it works better with more people tweeting, but cute nonetheless.

As always, CINF hosts some great receptions and this meetings’ ones were no exception. Though the weather didn’t cooperate, the convention center was pretty nice in providing free wireless. This came in especially useful as we had a speaker with three talks in our program but was unable to make it to the meeting. With the wireless available, we successfully connected with him over Skype, and with me switching slides, were able to have him present (audio and video) his work. Definitely not a trend we want to encourage, but for emergencies, “Skype talks” are great!

At this meeting I also organized an experimental symposium consisting of lightning talks – 8 minutes talks on arbitrary (but hopefully interesting) topics in chemical information and cheminformatics. While we only had 5 speakers, we had a great set of talks – I’m still amazed at how Richard West got through 24 slides in 7 minutes so smoothly! While it could have been publicized better, we got a lot of good feedback and will be running a revamped version in Denver, next fall.

Overall a pretty good meeting, and my last meeting as CINF Program Chair. I had a great time in this role, and with the help of a very capable Program Committee, I think we were able to successfully develop interesting multidisciplinary programs over the last four meetings. As I step down, Rachelle Bienstock from the NIEHS will take over as Program Chair, and I wish her all the best. However, I’m not done with CINF just yet :) I’ve been elected as Chair-Elect of CINF for 2011 so will be switching roles (though I certainly hope to continue contributing to the CINF program in the future).

I’ll end with three suggestions for the ACS: 1. Seriously consider letting divisions to drop Thursdays 2. Reduce registration fees & do a better job on hotel rates 3. Fix the meetings to one or two places (preferably San Francisco).

Update: I had misstated Anthony Nicholl points from his presentation. The post is updated to the correct that.

Written by Rajarshi Guha

August 27th, 2010 at 2:54 am

Posted in cheminformatics

Tagged with ,

SALI in Bulk

without comments

Sometime back John Van Drie and I had developed the Structure Activity Landscape Index (SALI), which is a way to quantify activity cliffs – pairs of compounds which are structurally very similar but have significantly different activities. In preparation for a talk on SALI at the Boston ACS, I was looking for SAR datasets that contained cliffs. It turns out that ChEMBL is a a great resource for SAR data. And with the EBI providing database dumps it’s very easy to query across the entire collection to find datasets of interest.

For the purposes of this talk, I wanted to see what the datasets looked like in terms of the presence (or absence of cliffs). Given that the idea of an activity cliff is only sensible for ligand receptor type interactions, I only considered compound sets associated with binding assays. Furthermore, I only considered those assays which involved human targets, had a confidence score greater than 8 and contained between 75 and 500 molecules. (If you have an Oracle installation of ChEMBL then this SQL snippet will get you the list of assays satisfying these constraints).

This gives us 31 assays, which we can now analyze. For the purposes of this note, I evaluated the CDK hashed fingerprints and used the standardized activities to generate the pairwise SALI values for each of the datasets (performing the appropriate log transformation  of the activities when required). The matrices that represent the pairwise SALI values are plotted in the heatmap montage below (the ChEMBL assay ID is noted in each image) where black represents the minimum SALI value and white represents the maximum SALI value for that dataset. (See the original paper for more details on this representation.) Clearly, the “roughness” of the activity landscape differs from dataset to dataset.

At this point I haven’t looked in depth into each dataset to characterize the landscapes in more detail, but this is a quick summary of multiple datasets. (Though a few datasets contain cliffs which are derived from stereoiomers and hence may not actually be real cliffs – since their activity difference may be small, but will look structurally identical to the fingerprint).

An alternative and useful representation is to convert the SALI values for a dataset into an empirical cumulative distribution function to provide a more quantitative view of how cliffs are distributed within a landscape. I’ll leave those details for the talk.

Written by Rajarshi Guha

August 11th, 2010 at 4:31 am

Job Openings at the NCGC

without comments

I’ve been at the NCGC for a little more than a year and I can say that it’s a great place to work – smart people, cutting edge projects in chemical genomics and chemical biology, opportunities to be involved in all aspects of HTS projects and fresh data (lots of it). Now there’s opportunities for others to join the fun!

Sometime back, my colleague Trung posted an ad for a software engineer position, primarily working on our chemogenomics data application. Now, we’re also looking for a research informatics scientist. See the detailed ad for more information. For both positions, see the ads themselves for contact details. If you’d like to chat face to face I’ll be at the ACS in Boston this month, so drop me a line and we can chat in Boston.

Written by Rajarshi Guha

August 3rd, 2010 at 11:50 pm

Posted in Uncategorized

Tagged with ,

CINFlash Deadline Approaching

with 2 comments

One more week to go (Aug 7 is the deadline) to put in short abstracts for the CINFlash lightning talk symposium at the fall ACS meeting in Boston this month. This is your chance for 6 minutes of fame!

Written by Rajarshi Guha

August 2nd, 2010 at 2:17 pm

Posted in Uncategorized

Tagged with , , ,