Archive for June, 2011
With the recent stable release of the CDK (1.3.12) and the inclusion of the new rendering classes, I was able to make a new release of the rcdk (3.1.1) and rcdklibs (1.3.11) packages that support cheminformatics in R. They’ve been pushed to CRAN and should be visible in a day or two. The new features in the latest version of rcdk include
- Directly evaluate molecular volume (based on group contributions) using get.volume
- Generate fingerprints using the hybridization state
- get.total.charge and get.total.formal.charge work sensibly
- Added a function (copy.image.to.clipboard) that copies the 2D depiction of a molecule to the system clipboard in PNG format
- Now, OS X users can view and copy molecule depictions. This is slower compared to the same operation on Windows or Linux since it involves shell’ing out via system. But it is better than not being able to view anything.
Sometime back Egon implemented a simple group contribution based volume calculator and it made its way into the stable branch (1.4.x) today. As a result I put out a new version of the CDKDescUI which includes a descriptor that wraps the new volume calculator as well as the hybridization fingerprinter that Egon also implemented recently. The volume descriptor (based on the VABCVolume class) is one that has been missing for the some time (though the NumericalSurface class did return a volume, but it’s slow). This class is reasonably fast (10,000 molecules processed in 32 sec) and correlates well with the 2D and pseudo-3D volume descriptors from MOE (2008.10) as shown below. As expected the correlation is better with the 2D version of the descriptor (which is similar in nature to the lookup method used in the CDK version). The X-axis represents the CDK descriptor values.
I’ve submitted version 3.4.3 of the fingerprint package to CRAN, so it should be available in a day or two. It’s an R package that lets you read in (chemical structure) fingerprint data from a variety of sources (CDK, MOE, BCI etc) and perform a variety of operations (bitwise, similarity, etc.) and visualizations on them.
The two main additions to this version are
- Read support for the new FPS fingerprint format described by Andrew Dalke at the chemfp project. Note, it currently discards some of header information
- The fingerprint class now has a field, misc, (a list) that allows one to read in extra, arbitrary data that might be provided along with a fingerprint. Exactly what gets stored in this field depends on the line function used to read in the fingerprint data. Currently only the FPS parser returns extra data (when available) in this field.
As always, you can get the package source directly from the Github repository.