Retrieving Target Classifications from ChEMBL

There are a number of scenarios when it’s useful to be able to classify protein targets – high level summaries, enrichment calculations and so on. There are a variety of protein classification schemes out there such as PANTHER, SCOP and InterPro. These schemes are based on domains and other structural features. ChEMBL provides it’s own […]

Which Datasets Lead to Predictive Models?

I came across a recent paper from the Tropsha group that discusses the issue of modelability – that is, can a dataset (represented as a set of computed descriptors and an experimental endpoint) be reliably modeled. Obviously the definition of reliable is key here and the authors focus on a cross-validated classification accuracy as the […]

Updated version of rcdk (3.2.3)

I’ve pushed updates to the rcdklibs and rcdk packages that support cheminformatics in R using the CDK. The new versions employ the latest CDK master, which as Egon pointed out  has significantly fewer bugs, and thanks to Jon, improved performance. New additions to the package include support for the LINGO and Signature fingerprinters (you’ll need the […]

Support for feature,count fingerprints in fingerprint 3.5.0

I’ve just updated the fingerprint package to v3.5.0 (should show up on CRAN shortly, or else you can get it directly from my Github repository). The main update in this version is better support for feature,count type fingerprints. An example would be ECFP or signature fingerprints. In these types of fingerprints, the output is usually […]