The CDK is 10 Years Old

As Egon has pointed out, the CDK project started 10 years ago today tomorrow – congratulations to everybody involved in the project. But also, Egon deserves a huge vote of thanks for keeping the project going – not only in terms of code contributions but also the “grunt” work such as releases, bug fixes, documentation and […]

New Versions of rcdk and rcdklibs

I’ve put released an update to rcdk and rcdklibs on CRAN – right now source packages are available, but binary ones should show up soon. Both packages should be updated together. These packages integrate the CDK into the R environment and simplifies a number of cheminformatics tasks. These versions used CDK 1.3.6 and JCP 16, […]

Author Count Frequencies in PubMed

Earlier today, Emily Wixson posted a question on the CHMINF-L list asking … if there is any way to count the number of authors of papers with specific keywords in the title by year over a decade … Since I had some code compiling and databases loading I took a quick stab, using Python and […]

Pig and Cheminformatics

Pig is a platform for analyzing large datasets. At its core is a high level language (called Pig Latin), that is focused on specifying a series of data transformations. Scripts written in Pig Latin are executed by the Pig infrastructure either in local or map/reduce modes (the latter making use of Hadoop). Previously I had […]

So much to do, so little time

Trying to squeeze sense out of chemical data

The CDK is 10 Years Old

New Versions of rcdk and rcdklibs

Author Count Frequencies in PubMed

Pig and Cheminformatics