So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘Uncategorized’ Category

2nd Call for Papers – ICCS, 2011

without comments

This has already been posted on some mailing lists, but one more place can’t hurt. The International Conference on Chemical Structures (ICCS) is coming up in June, 2011 at Noordwijkerhout, The Netherlands. I’m on the scientific advisory board and am planning to attend this meeting, as the topics being covered look pretty interesting, especially those focusing on ‘systems’ aspects of cheminformatics and bioinformatics. The abstract submission deadline is January 31, 2011.

C A L L   F O R   P A P E R S
9th International Conference on Chemical Structures
NH Leeuwenhorst Conference Hotel,
Noordwijkerhout, The Netherlands

5-9 June 2011

Visit the conference website at for
more information.

The 9th International Conference on Chemical Structures (ICCS) is
seeking presentations of novel research and emerging technologies for
the following plenary sessions:

o Cheminformatics
> advances in structure representation
> reaction handling and electronic lab notebooks (ELNs)
> molecular similarity and diversity
> chemical information visualization

o Structure-Activity and Structure-Property Prediction
> graphical methods for SAR analysis
> industrialized and large-scale model building
> multi-property prediction and multi-objective optimization

o Structure-Based Drug Design and Virtual Screening
> new docking and scoring approaches
> improved understanding of protein-ligand interactions
> pharmacophore definition and search
> modeling of challenging targets

o Analysis of Large Chemistry Spaces
> mining of chemical literature and patents
> design, profiling and comparison of compound collections and screening sets
> machine learning and knowledge extraction from databases

o Integrated Chemical Information
> advances in chemogenomics
> integration of medical and biological information
> semantic technologies as a driver of integration
> translational informatics

o Dealing with Biological Complexity
> analysis and prediction of poly-pharmacology
> in-silico analysis of toxicology, drug safety, and adverse events
> pathways and biological networks
> druggability of targets

Before and after the official conference program free workshops will be
offered by several companies including BioSolveIT (
and the Chemical Computing Group (

Joint Organizers:
o Division of Chemical Information of the American Chemical Society
o Chemical Structure Association Trust (CSA Trust)
o Division of Chemical Information and Computer Science of the
Chemical Society of Japan (CSJ)
o Chemistry-Information-Computer Division of the Society of German
Chemists (GDCh)
o Royal Netherlands Chemical Society (KNCV)
o Chemical Information Group of the Royal Society of Chemistry (RSC)
o Swiss Chemical Society (SCS)

We encourage the submission of papers on both applications and case
studies as well as on method development and algorithmic work. The final
program will be a balance of these two aspects.

From the submissions the program committee and the scientific advisory
board will select about 30 papers for the plenary sessions. All submissions
that cannot be included in the plenary sessions will automatically be
considered for the poster session.

Contributions can be submitted for any of the above and related areas,
but we also welcome contributions in any aspect of the computer handling
of chemical structure information, such as:

o automatic structure elucidation
o combinatorial chemistry, diversity analysis
o web technology and its effect on chemical information
o electronic publishing
o MM or QM/MM simulations
o practical free energy calculations
o modeling of ADME properties
o material sciences
o analysis and prediction of crystal structures
o grid and cloud computing in cheminformatics

Visit the conference website at for
more information, including details on procedures for online abstract
submission and conference registration.

The deadline for the submission of abstracts is 31 January 2011.

We hope to see you in Noordwijkerhout.

Keith T Taylor, ICCS Chair
Markus Wagener, ICCS Co-Chair

Written by Rajarshi Guha

December 11th, 2010 at 3:22 pm

Posted in Uncategorized

Tagged with ,

Job Openings at the NCGC

without comments

I’ve been at the NCGC for a little more than a year and I can say that it’s a great place to work – smart people, cutting edge projects in chemical genomics and chemical biology, opportunities to be involved in all aspects of HTS projects and fresh data (lots of it). Now there’s opportunities for others to join the fun!

Sometime back, my colleague Trung posted an ad for a software engineer position, primarily working on our chemogenomics data application. Now, we’re also looking for a research informatics scientist. See the detailed ad for more information. For both positions, see the ads themselves for contact details. If you’d like to chat face to face I’ll be at the ACS in Boston this month, so drop me a line and we can chat in Boston.

Written by Rajarshi Guha

August 3rd, 2010 at 11:50 pm

Posted in Uncategorized

Tagged with ,

CINFlash Deadline Approaching

with 2 comments

One more week to go (Aug 7 is the deadline) to put in short abstracts for the CINFlash lightning talk symposium at the fall ACS meeting in Boston this month. This is your chance for 6 minutes of fame!

Written by Rajarshi Guha

August 2nd, 2010 at 2:17 pm

Posted in Uncategorized

Tagged with , , ,

What Has Cheminformatics Done for You Lately?

with 3 comments

Recently there have been two papers asking whether cheminformatics or virtual screening in general, have really helped drug discovery, in terms of lead discovery.

The first paper from Muchmore et al focuses on the utility of various cheminformatics tools in drug discovery.  Their report is retrospective in nature where they note that while much research has been done in developing descriptors and predictors of various molecular properties (solubility, bioavilability etc), it does not seem that this has contributed to increased productivity. They suggest three possible reasons for this

  • not enough time to judge the contributions of cheminformatics methods
  • methods not being used properly
  • methods themselves not being sufficiently accurate.

They then go on consider how these reasons may apply to various cheminformatics methods and tools that are accessible to medicinal chemists. Examples range from molecular weight and ligand efficiency to solubility, similarity and bioisosteres. They use a 3-class scheme – known knowns, unknown knowns and unknown unknowns corresponding to methods whose underlying principles are whose results can be robustly interpreted, methods for properties that we don’t know how to realistically evaluate (but which we may still do so – such as solubility) and methods for which we can get a numerical answer but whose meaning or validity is doubtful. Thus for example, ligand binding energy calculations are placed in the “unknown unknown” category and similarity searches are placed in the “known unknown” category.

It’s definitely an interesting read, summarizing the utility of various cheminformatics techniques. It raises a number of interesting questions and issues. For example, a recurring issue is that many cheminformatics methods are ultimately subjective, even though the underlying implementation may be quantitative – “what is a good Tanimoto cutoff?” in similarity calculations would be a classic example.  The downside of the article is that it does appear at times to be specific to practices at Abbott.

The second paper is by Schneider and is more prospective and general in nature and discusses some reasons as to why virtual screening has not played a more direct role in drug discovery projects. One of the key points that Schneider makes is that

appropriate “description of objects to suit the problem” might be the key to future success

In other words, it may be that molecular descriptors, while useful surrogates of physical reality, are probably not sufficient to get us to the next level. Schneider even states that “… the development of advanced virtual screening methods … is currently stagnated“. This statement is true in many ways, especially if one considers the statistical modeling side of virtual screening (i.e., QSAR). Many recent papers discuss slight modifications to well known algorithms that invariably lead to an incremental improvement in accuracy. Schneider suggests that improvements in our understanding of the physics of the drug discovery problem – protein folding, allosteric effects, dynamics of complex formation, etc – rather than continuing to focus on static properties (logP etc) will lead to advances. Another very valid point is that future developments will need to move away from the prediction or modeling of “… one to one interactions between a ligand and a single target …”  and instead will need to consider “… many to many relationships …“. In other words, advances in virtual screen will address (or need to address) the ligand non-specificity or promiscuity. Thus activity profiles, network models and polyparmacology will all be vital aspects of successful virtual screening.

I really like Schneiders views on the future of virtual screening, even though they are rather general. I agree with his views on the stagnation of machine learning (QSAR) methods but at the same time I’m reminded of a paper by Halevy et al, which highlights the fact that

simple models and a lot of data trump more elaborate models based on less data

Now, they are talking about natural language processing using trillion-word corpora. Not exactly the situation we face in drug discovery! But, it does look like we’re slowly going in the direction of generating biological datasets of large size and of multiple types. A recent NIH RFP proposes this type of development. Coupled with well established machine learning methods, this could be lead to some very interesting developments. (Of course even ‘simple’ properties such as solubility could benefit from a ‘large data’ scenario as noted by Muchmore et al).

Overall, two interesting papers looking at the state of the field from different views.

Written by Rajarshi Guha

April 5th, 2010 at 4:33 am

Simple XML Parsing with Clojure

with 2 comments

A while back I had started playing with Clojure. It’s always been a spare-time hobby and not having had much spare time I haven’t really gotten as far ahead with it as I’d have liked. I’m still not sure why I like Clojure, but it is fun to code in. My interest was revitalized when I came across a Clojure group located in the D.C. area. So following on my previous post on geo-referencing PubMed articles, I decided to take a stab at doing the whole thing in Clojure.

One of the tasks in this project is to query PubMed using the EUtils CGIs and parse out the information from the XML document that is returned. It turns out that parsing XML documents or strings is very easy in Clojure.  The parse method in the clojure.xml namespace supports parsing of XML documents, returning a tree of tags. Using xml-zipper from the namespace creates a zipper data structure from the tree. Extracting specific elements is achieved by filtering the zipper by the path to the desired element. It’s a lot like the ElementTree module in Python (but doesn’t require that I insert namespaces before each and every element in the path!). We start of by working in our own namespace and then importing the relevant packages

(ns entrez
  (:require [clojure.xml :as xml])
  (:require [ :as zip])
  (:require [ :as zf]))

Next we define some helper methods

(defn get-ids [zipper]
  "Extract specific elements from an XML document"
  (zf/xml-> zipper :IdList :Id zf/text))

(defn get-affiliations [zipper]
  "Extract affiliations from PubMed abstracts"
  (map (fn [x y] (list x y))
       (zf/xml-> zipper :PubmedArticle :MedlineCitation :PMID zf/text)
       (zf/xml-> zipper :PubmedArticle :MedlineCitation :Article :Affiliation zf/text)))

Finally, we can get the ID’s from an esearch query by saving the results to a file and then running

(println (get-ids
       (xml/parse "esearch.xml"))))

or extract affiliations from a set of PubMed abstracts obtained via an efetch query

(println (get-affiliations
       (xml/parse "efetch.xml"))))

In the next post I’ll show some code to actually perform the queries via EUtils so that we don’t need to save results to files.

Written by Rajarshi Guha

February 17th, 2010 at 3:30 am

Posted in Uncategorized,software

Tagged with , , ,