Archive for the ‘CRAN’ tag
Version 1.4.3 of rpubchem is out on CRAN. There’s some minor code cleanups and also a new function called get.aid.by.cid which allows you to get assay ID’s based on whether they contain a compound (either as an active, inactive, discrepant or just tested). This uses PUG to perform the query, so can be a bit slow (and occasionally just fail).
New versions of several of my R packages are now available on CRAN. rcdk 2.9.6 goes along with rcdklibs 1.2.3. The latter now uses the most recent cdk-1.2.x branch from Github. The former fixes a number of bugs relating to descriptor calculations, saving molecules in SD format and setting/getting properties on molecules. Unfortunately, because the 1.2.x branch does not have robust depiction code, the visualization methods in rcdk are currently disabled. The fingerprint package has also been updated and now includes a number of unit tests.
Being an R aficionado, I do the bulk of my work in R and having grown up with Emacs I tend to dislike having to exit my environment to do “other” stuff. This was the motivation for integrating R and the CDK, so that I could access and manipulate chemical information from within my R session. This resulted in the rcdk package.
Since then there have been a lot of improvements in the CDK and so the latest version (2.9.2) of rcdk includes them and also provides access to much more of the CDK via R idioms. As the original J. Stat. Soft. paper is now pretty much deprecated, we have included a tutorial in the form of a vignette. The latest version of rcdk is now much smaller, since we have split out the actual CDK libraries into a separate package called rcdklibs. This allows us to release new versions of rcdk, without requiring a bulky download each time, since rcdklibs should change at a slower pace. I’d also like to thank Miguel Rojas Cherto for his contributions to this version of rcdk (as well as to rpubchem).
So what can you do with rcdk? Installation is pretty simple – just point your favorite interface to CRAN (or a mirror) and it should get it along with all the dependencies. After loading the library, you can read in any file format that the CDK supports or directly parse a SMILES
mols <- load.molecules("mymols.sdf")
mol.smiles <- parse.smiles("CC(=O)Cc1cc(Cl)ccc1")
which gives you a list of molecule objects. Note that these objects are actually pointers to Java objects and so you can’t serialize these via R’s save command. This is a pain and so I’m planning to implement some code generators that will create S4 classes directly from the Java class definitions.
Once you have a molecule object you can do a variety of things:
## view molecule depictions
## evaluate fingerprints
fps <- get.fingerprints(mols, type="maccs")
## generate descriptors
dnames <- get.desc.names("topological")
descs <- eval.desc(mols, dnames)
One problem with the depiction code is that it does not work well on OS X. This is due to interactions between rJava and the R event handling loop. As a result, depictions show up, but then you can’t interact with the window. It does work fine on Linux and Windows. To easily handle fingerprints, I suggest the use of the fingerprint package. There are also methods to easily access atoms, bonds, molecule properties and so on.
Since I do a lot of cheminformatics work in R, I’ve created various functions and packages that make life easier for me as do my modeling and analysis. Most of them are for private consumption. However, I’ve released a few of them to CRAN since they seem to be generally useful.
One of them is the fingerprint package (version 2.9 was just uploaded to CRAN) , that is designed to read and manipulate fingerprint data generated from various cheminformatics toolkits or packages. Right now it supports output from the CDK, BCI and MOE. Fingerprints are represented using S4 classes. This allows me to override the R logical operators, so that one can do things like compute the logical OR of two fingerprints.