So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for November, 2010

Inserting 2D Depictions into R Plots

without comments

Recent versions of rcdk allow you to insert images of chemical structures into R plots, via the view.image.2d and rasterImage functions. One problem with the latter function is that the 2D structure image must be located in plot units, rather than pixel units. Paul Murrell suggested an easy way to insert the raster image into the plot region, maintaining the  native resolution of the image:

m <- parse.smiles("O=C(C1=CC=CC=C1)C1=CC=CC=C1")[[1]]
img <- view.image.2d(m, 200,200)
plot(10:1, pch=19)

## Position the depiction at the lower left corner
dpi <- (par("cra")/par("cin"))[1]
usr <- par("usr")
xl <- usr[1]
yb <- usr[3]
xr <- xl + xinch(200/dpi)
yt <- yb + yinch(200/dpi)

rasterImage(img, xl,yb, xr,yt)

Written by Rajarshi Guha

November 20th, 2010 at 5:09 pm

Posted in software,cheminformatics

Tagged with , ,

PAINS Substructure Filters as SMARTS

with 2 comments

Sometime back Baell et al published an interesting paper describing a set of substructure filters to identify compounds that are promiscuous in high throughput biochemical screens. They termed these compounds Pan Assay Interference Compounds or PAINS. There are a variety of functional groups that are known to be problematic in HTS assays. The reasons for exclusion of molecules with these and other groups range from reactivity towards proteins to poor developmental potential or known toxicity. Derek Lowe has a nice summary of the paper.

The paper published the substructure filters as a collection of Sybyl Line Notation (SLN) patterns. Unfortunately, without access to Sybyl, it’s difficult to reuse the published patterns. Having them in  SMARTS form would allow one to use them with many more (open source or commercial) tools. Luckily, Wolf Ihlenfeldt came to the rescue and provide me access to a version of the CACTVS toolkit that was able to convert the SLN patterns to SMARTS.

There are three files, p_l15, p_l150 and p_m150 corresponding to tables S8, S7 and S6 from the supplementary information. The first column is the pattern and the second column is the name for that pattern taken from the original SLN files. While all patterns were converted to SMARTS, the conversion process is not perfect as I have not been able to reproduce (using the OEChem toolkit with the Tripos aromaticity model) all the hits that were obtained using the original SLN patterns.

(As a side note, the SMARTSViewer is a really handy tool to visualize a SMARTS pattern – which is great since many of the PAINS patterns are very complex)

Written by Rajarshi Guha

November 14th, 2010 at 8:41 pm