Recent versions of rcdk allow you to insert images of chemical structures into R plots, via the view.image.2d and rasterImage functions. One problem with the latter function is that the 2D structure image must be located in plot units, rather than pixel units. Paul Murrell suggested an easy way to insert the raster image into […]
PAINS Substructure Filters as SMARTS
Sometime back Baell et al published an interesting paper describing a set of substructure filters to identify compounds that are promiscuous in high throughput biochemical screens. They termed these compounds Pan Assay Interference Compounds or PAINS. There are a variety of functional groups that are known to be problematic in HTS assays. The reasons for […]
Working with Sequences in R
I’ve been working on some RNAi projects and part of that involved generating descriptors for sequences. It turns out that the Biostrings package is very handy and high performance. So, our database contains a catalog for an siRNA library with ~ 27,000 target DNA sequences. To get at the siRNA sequence, we need to convert […]
A Comment on Fingerprint Performance
In a comment to my previous post on bit collisions in hashed fingerprints, Asad reported on some interesting points which would be useful to have up here: Very interesting topic. I have faced these challenges while working with fingerprints and here are few observations from my end. By the way I agree that mathematically the […]
Hashed Fingerprints and RNG’s
In my previous post I looked at how many collisions in bit positions were observed when generating hashed fingerprints (using the CDK 1024-bit hashed fingerprint and the Java hashCode method). I summarized the results in the form of “bit collision plots” where I plotted the number of times a bit was set to 1 versus […]