I came across an interesting paper by Ann Boulesteix where she discusses the problem of false positive results being reported in the bioinformatics literature. She highlights two underlying phenomena that lead to this issue – “fishing for significance” and “publication bias”. The former phenomenon is characterized by researchers identifying datasets on which their method works better […]
Updated Versions of R Packages
New versions of several of my R packages are now available on CRAN. rcdk 2.9.6 goes along with rcdklibs 1.2.3. The latter now uses the most recent cdk-1.2.x branch from Github. The former fixes a number of bugs relating to descriptor calculations, saving molecules in SD format and setting/getting properties on molecules. Unfortunately, because the […]
A New Toolkit on the Block
A few days ago I was pointed to a new cheminformatics toolkit called Indigo, written in C++ with source code available under the GPLv3 license. Rich has previously commented on this. While it’s an initial release, it has a number of interesting components such as an Oracle cartridge, 2D depiction and scaffold detection and R-group […]
From Theory to Practice
Some time back, John Van Drie and myself had done some work on characterizing structure-activity cliffs, which are molecules that have very similar structures but very different activities. The term originated from Maggiora, who suggested that this was a reason for the failure of many QSAR models. At the same time, such cliffs can represent […]
Nice Article on Open Source Cheminformatics
I met Viven Marx at the BioIT World conference held in Boston earlier this month, in which I spoke on the topic of Open Source cheminformatics. The result of conversations between myself (and Peter Murray Rust) and here were incorporated into an interesting article. (Though the CDK was started at Notre Dame, but is now […]