Archive for the ‘CRAN’ tag
While trying to update rcdk on CRAN it was pointed out to me that usage of the library resulted in modifications to the users home directory. Specifically, this occurred when generating InChI‘s. The CDK makes use of jni-inchi, which in turn depends on JNATI which enables Java code to work with native libraries in a platform independent fashion. As part of this, it creates
$HOME/.jnati – which is a no-no for CRAN packages. To resolve this, the latest version of rcdklibs excludes the InChI module and its dependencies. Hopefully rcdk and rcdklibs will now pass CRAN QC.
To access InChI functionality in R you can use the rinchi package which is hosted on Github. Since it will modify the users home directory, it cannot be hosted on CRAN. However, it’s easy enough to install
install_github("cdkr", "rajarshi", subdir="rinchi")
Importantly, if all you need is to go from SMILES to InChI, there is no need to install rcdk as well. So the following works
inchi <- get.inchi('CCC')
inchik <- get.inchi.key('CCC')
But if you do have a molecule object obtained via rcdk, you can also pass that in to get an InChI or InChI key representation.
Version 1.4.3 of rpubchem is out on CRAN. There’s some minor code cleanups and also a new function called get.aid.by.cid which allows you to get assay ID’s based on whether they contain a compound (either as an active, inactive, discrepant or just tested). This uses PUG to perform the query, so can be a bit slow (and occasionally just fail).
New versions of several of my R packages are now available on CRAN. rcdk 2.9.6 goes along with rcdklibs 1.2.3. The latter now uses the most recent cdk-1.2.x branch from Github. The former fixes a number of bugs relating to descriptor calculations, saving molecules in SD format and setting/getting properties on molecules. Unfortunately, because the 1.2.x branch does not have robust depiction code, the visualization methods in rcdk are currently disabled. The fingerprint package has also been updated and now includes a number of unit tests.
Being an R aficionado, I do the bulk of my work in R and having grown up with Emacs I tend to dislike having to exit my environment to do “other” stuff. This was the motivation for integrating R and the CDK, so that I could access and manipulate chemical information from within my R session. This resulted in the rcdk package.
Since then there have been a lot of improvements in the CDK and so the latest version (2.9.2) of rcdk includes them and also provides access to much more of the CDK via R idioms. As the original J. Stat. Soft. paper is now pretty much deprecated, we have included a tutorial in the form of a vignette. The latest version of rcdk is now much smaller, since we have split out the actual CDK libraries into a separate package called rcdklibs. This allows us to release new versions of rcdk, without requiring a bulky download each time, since rcdklibs should change at a slower pace. I’d also like to thank Miguel Rojas Cherto for his contributions to this version of rcdk (as well as to rpubchem).
So what can you do with rcdk? Installation is pretty simple – just point your favorite interface to CRAN (or a mirror) and it should get it along with all the dependencies. After loading the library, you can read in any file format that the CDK supports or directly parse a SMILES
mols <- load.molecules("mymols.sdf")
mol.smiles <- parse.smiles("CC(=O)Cc1cc(Cl)ccc1")
which gives you a list of molecule objects. Note that these objects are actually pointers to Java objects and so you can’t serialize these via R’s save command. This is a pain and so I’m planning to implement some code generators that will create S4 classes directly from the Java class definitions.
Once you have a molecule object you can do a variety of things:
## view molecule depictions
## evaluate fingerprints
fps <- get.fingerprints(mols, type="maccs")
## generate descriptors
dnames <- get.desc.names("topological")
descs <- eval.desc(mols, dnames)
One problem with the depiction code is that it does not work well on OS X. This is due to interactions between rJava and the R event handling loop. As a result, depictions show up, but then you can’t interact with the window. It does work fine on Linux and Windows. To easily handle fingerprints, I suggest the use of the fingerprint package. There are also methods to easily access atoms, bonds, molecule properties and so on.
Since I do a lot of cheminformatics work in R, I’ve created various functions and packages that make life easier for me as do my modeling and analysis. Most of them are for private consumption. However, I’ve released a few of them to CRAN since they seem to be generally useful.
One of them is the fingerprint package (version 2.9 was just uploaded to CRAN) , that is designed to read and manipulate fingerprint data generated from various cheminformatics toolkits or packages. Right now it supports output from the CDK, BCI and MOE. Fingerprints are represented using S4 classes. This allows me to override the R logical operators, so that one can do things like compute the logical OR of two fingerprints.