I’ve submitted version 3.4.3 of the fingerprint package to CRAN, so it should be available in a day or two. It’s an R package that lets you read in (chemical structure) fingerprint data from a variety of sources (CDK, MOE, BCI etc) and perform a variety of operations (bitwise, similarity, etc.) and visualizations on them.
The two main additions to this version are
- Read support for the new FPS fingerprint format described by Andrew Dalke at the chemfp project. Note, it currently discards some of header information
- The fingerprint class now has a field, misc, (a list) that allows one to read in extra, arbitrary data that might be provided along with a fingerprint. Exactly what gets stored in this field depends on the line function used to read in the fingerprint data. Currently only the FPS parser returns extra data (when available) in this field.
As always, you can get the package source directly from the Github repository.
Is the code to read the FPS format written in Java, or in R script? If the former, maybe we can port it to the CDK too…
Thanks Rajarshi!
@Egon, it’s written in C. I was going to get to a Java reader – but I’m not sure where we put it in the CDK, since we don’t really have any fingerprint reading classes (though I suppose it could go into the io package)