So much to do, so little time

Trying to squeeze sense out of chemical data

rinchi – An R package to generate InChI’s and InChI Keys

While trying to update rcdk on CRAN it was pointed out to me that usage of the library resulted in modifications to the users home directory. Specifically, this occurred when generating InChI‘s. The CDK makes use of jni-inchi, which in turn depends on JNATI which enables Java code to work with native libraries in a platform independent fashion. As part of this, it creates \$HOME/.jnati – which is a no-no for CRAN packages. To resolve this, the latest version of rcdklibs excludes the InChI module and its dependencies. Hopefully rcdk and rcdklibs will now pass CRAN QC.

To access InChI functionality in R you can use the rinchi package which is hosted on Github. Since it will modify the users home directory, it cannot be hosted on CRAN. However, it’s easy enough to install

 12 library(devtools) install_github("cdkr", "rajarshi", subdir="rinchi")

Importantly, if all you need is to go from SMILES to InChI, there is no need to install rcdk as well. So the following works

 12 inchi <- get.inchi('CCC') inchik <- get.inchi.key('CCC')

But if you do have a molecule object obtained via rcdk, you can also pass that in to get an InChI or InChI key representation.

Written by Rajarshi Guha

August 30th, 2014 at 6:23 pm

Posted in software,cheminformatics

Tagged with , , , , ,

New Version of rpubchem

Version 1.4.3 of rpubchem is out on CRAN. There’s some minor code cleanups and also a new function called get.aid.by.cid which allows you to get assay ID’s based on whether they contain a compound (either as an active, inactive, discrepant or just tested). This uses PUG to perform the query, so can be a bit slow (and occasionally just fail).

Written by Rajarshi Guha

November 21st, 2009 at 2:11 am

Posted in software

Tagged with , ,

Updated Versions of R Packages

New versions of several of my R packages are now available on CRAN. rcdk 2.9.6 goes along with rcdklibs 1.2.3. The latter now uses the most recent cdk-1.2.x branch from Github. The former fixes a number of bugs relating to descriptor calculations, saving molecules in SD format and setting/getting properties on molecules. Unfortunately, because the 1.2.x branch does not have robust depiction code, the visualization methods in rcdk are currently disabled. The fingerprint package has also been updated and now includes a number of unit tests.

Written by Rajarshi Guha

November 6th, 2009 at 1:04 am

Posted in software,cheminformatics

Tagged with ,

Cheminformatics in R – rcdk

Being an R aficionado, I do the bulk of my work in R and having grown up with Emacs I tend to dislike having to exit my environment to do “other” stuff. This was the motivation for integrating R and the CDK, so that I could access and manipulate chemical information from within my R session. This resulted in the rcdk package.

Since then there have been a lot of improvements in the CDK and so the latest version (2.9.2) of rcdk includes them and also provides access to much more of the CDK via R idioms. As the original J. Stat. Soft. paper is now pretty much deprecated, we have included a tutorial in the form of a vignette. The latest version of rcdk is now much smaller, since we have split out the actual CDK libraries into a separate package called rcdklibs. This allows us to release new versions of rcdk, without requiring a bulky download each time, since rcdklibs should change at a slower pace. I’d also like to thank Miguel Rojas Cherto for his contributions to this version of rcdk (as well as to rpubchem).

So what can you do with rcdk? Installation is pretty simple – just point your favorite interface to CRAN  (or  a mirror) and it should get it along with all the dependencies. After loading the library, you can read in any file format that the CDK supports or directly parse a SMILES

 12 mols <- load.molecules("mymols.sdf") mol.smiles <- parse.smiles("CC(=O)Cc1cc(Cl)ccc1")

which gives you a list of molecule objects. Note that these objects are actually pointers to Java objects and so you can’t serialize these via R’s save command. This is a pain and so I’m planning to implement some code generators that will create S4 classes directly from the Java class definitions.

Once you have a molecule object you can do a variety of things:

 123456789 ## view molecule depictions view.molecule.2d(mols) ## evaluate fingerprints fps <- get.fingerprints(mols, type="maccs") ## generate descriptors dnames <- get.desc.names("topological") descs <- eval.desc(mols, dnames)

One problem with the depiction code is that it does not work well on OS X. This is due to interactions between rJava and the R event handling loop. As a result, depictions show up, but then you can’t interact with the window. It does work fine on Linux and Windows. To easily handle fingerprints, I suggest the use of the fingerprint package. There are also methods to easily access atoms, bonds, molecule properties and so on.

Written by Rajarshi Guha

February 25th, 2009 at 3:40 pm

Posted in software

Tagged with , ,