So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for December, 2010

2nd Call for Papers – ICCS, 2011

without comments

This has already been posted on some mailing lists, but one more place can’t hurt. The International Conference on Chemical Structures (ICCS) is coming up in June, 2011 at Noordwijkerhout, The Netherlands. I’m on the scientific advisory board and am planning to attend this meeting, as the topics being covered look pretty interesting, especially those focusing on ‘systems’ aspects of cheminformatics and bioinformatics. The abstract submission deadline is January 31, 2011.

C A L L   F O R   P A P E R S
9th International Conference on Chemical Structures
NH Leeuwenhorst Conference Hotel,
Noordwijkerhout, The Netherlands

5-9 June 2011

Visit the conference website at for
more information.

The 9th International Conference on Chemical Structures (ICCS) is
seeking presentations of novel research and emerging technologies for
the following plenary sessions:

o Cheminformatics
> advances in structure representation
> reaction handling and electronic lab notebooks (ELNs)
> molecular similarity and diversity
> chemical information visualization

o Structure-Activity and Structure-Property Prediction
> graphical methods for SAR analysis
> industrialized and large-scale model building
> multi-property prediction and multi-objective optimization

o Structure-Based Drug Design and Virtual Screening
> new docking and scoring approaches
> improved understanding of protein-ligand interactions
> pharmacophore definition and search
> modeling of challenging targets

o Analysis of Large Chemistry Spaces
> mining of chemical literature and patents
> design, profiling and comparison of compound collections and screening sets
> machine learning and knowledge extraction from databases

o Integrated Chemical Information
> advances in chemogenomics
> integration of medical and biological information
> semantic technologies as a driver of integration
> translational informatics

o Dealing with Biological Complexity
> analysis and prediction of poly-pharmacology
> in-silico analysis of toxicology, drug safety, and adverse events
> pathways and biological networks
> druggability of targets

Before and after the official conference program free workshops will be
offered by several companies including BioSolveIT (
and the Chemical Computing Group (

Joint Organizers:
o Division of Chemical Information of the American Chemical Society
o Chemical Structure Association Trust (CSA Trust)
o Division of Chemical Information and Computer Science of the
Chemical Society of Japan (CSJ)
o Chemistry-Information-Computer Division of the Society of German
Chemists (GDCh)
o Royal Netherlands Chemical Society (KNCV)
o Chemical Information Group of the Royal Society of Chemistry (RSC)
o Swiss Chemical Society (SCS)

We encourage the submission of papers on both applications and case
studies as well as on method development and algorithmic work. The final
program will be a balance of these two aspects.

From the submissions the program committee and the scientific advisory
board will select about 30 papers for the plenary sessions. All submissions
that cannot be included in the plenary sessions will automatically be
considered for the poster session.

Contributions can be submitted for any of the above and related areas,
but we also welcome contributions in any aspect of the computer handling
of chemical structure information, such as:

o automatic structure elucidation
o combinatorial chemistry, diversity analysis
o web technology and its effect on chemical information
o electronic publishing
o MM or QM/MM simulations
o practical free energy calculations
o modeling of ADME properties
o material sciences
o analysis and prediction of crystal structures
o grid and cloud computing in cheminformatics

Visit the conference website at for
more information, including details on procedures for online abstract
submission and conference registration.

The deadline for the submission of abstracts is 31 January 2011.

We hope to see you in Noordwijkerhout.

Keith T Taylor, ICCS Chair
Markus Wagener, ICCS Co-Chair

Written by Rajarshi Guha

December 11th, 2010 at 3:22 pm

Posted in Uncategorized

Tagged with ,

Visualizing PAINS SMARTS

without comments

A few days ago I had made available a SMARTS version of the PAINS substructural filters, that were converted using CACTVS from the original SLN patterns. I had mentioned that the SMARTSViewer application was a handy way to visualize the complex SMARTS patterns. Matthias Rarey let me know that his student had converted all the SMARTS to SMARTSViewer depictions and made them available as a PDF. Given the complexity of many of the PAINS patterns, these depictions are a very nice way to get a quick idea of what is supposed to match.

(FWIW, the SMARTS don’t reproduce the matches obtained using the original SLN’s – but hopefully the depictions will help anybody who’d like to try and fix the SMARTS).

Written by Rajarshi Guha

December 2nd, 2010 at 1:18 am

Posted in software,cheminformatics

Tagged with , ,

Similarity Matrices in Parallel

without comments

Today I got an email asking whether it’d be possible to speed up a fingerprint similarity matrix calculation in R. Now, pairwise similarity matrix calculations (whether they’re for molecules or sequences or anything else) are by definition quadratic in nature. So performing these calculations for large collections aren’t always feasible – in many cases, it’s worthwhile to rethink the problem.

But for those situations where you do need to evaluate it, a simple way to parallelize the calculation is to evaluate the similarity of each molecule with all the rest in parallel. This means each process/thread must have access to the entire set of fingerprints. So again, for very large collections, this is not always practical. However, for small collections parallel evaluation can lead to speed ups.

The fingerprint package provides a method to directly get the similarity matrix for a set of fingerprints, but this is implemented in interpreted R so is not very fast. Given a list of fingerprints, a manual evaluation of the similarity matrix can be done using nested lapply’s:

sims <- lapply(fps, function(x) {
  unlist(lapply(fps, function(y) distance(x,y)))

For 1012 fingerprints, this takes 286s on my Macbook Pro (4GB, 2.4 GHz). Using snow, we can convert this to a parallel version, which takes 172s on two cores:

cl <- makeCluster(4, type = "SOCK")
clusterEvalQ(cl, library(fingerprint))
clusterExport(cl, "fps")
sim <- parLapply(cl, fps, function(x) {
  unlist(lapply(fps, function(y) distance(x,y)))

Written by Rajarshi Guha

December 2nd, 2010 at 1:03 am