Sometime back Baell et al published an interesting paper describing a set of substructure filters to identify compounds that are promiscuous in high throughput biochemical screens. They termed these compounds Pan Assay Interference Compounds or PAINS. There are a variety of functional groups that are known to be problematic in HTS assays. The reasons for exclusion of molecules with these and other groups range from reactivity towards proteins to poor developmental potential or known toxicity. Derek Lowe has a nice summary of the paper.
The paper published the substructure filters as a collection of Sybyl Line Notation (SLN) patterns. Unfortunately, without access to Sybyl, it’s difficult to reuse the published patterns. Having them in SMARTS form would allow one to use them with many more (open source or commercial) tools. Luckily, Wolf Ihlenfeldt came to the rescue and provide me access to a version of the CACTVS toolkit that was able to convert the SLN patterns to SMARTS.
There are three files, p_l15, p_l150 and p_m150 corresponding to tables S8, S7 and S6 from the supplementary information. The first column is the pattern and the second column is the name for that pattern taken from the original SLN files. While all patterns were converted to SMARTS, the conversion process is not perfect as I have not been able to reproduce (using the OEChem toolkit with the Tripos aromaticity model) all the hits that were obtained using the original SLN patterns.
(As a side note, the SMARTSViewer is a really handy tool to visualize a SMARTS pattern – which is great since many of the PAINS patterns are very complex)