Spreading the Word About R & Cheminformatics

These last few days I’ve been in the UK for an EBI workshop on cheminformatics in R. It was a two day workshop, the first day focusing on general cheminormatics in R using the rcdk and rpubchem packages, and the second day focusing on doing mass spectrometry in R using XCMS and Rdisop, run by Steffen Neumann and Paul Benton. It was an excellent workshop with participation from industry and academia and skill levels ranging from new R users to experts and people with minimal cheminformatics backgrounds to full time cheminformaticians. While I think my exercises might have been a little too difficult, I think we were able to cover a variety of topics ranging from details on how to do specific cheminformatics operations in R to more application oriented tasks such as fingerprint based analysis and benchmarking virtual screening methods. The slides from the workshop are available here – it’s a pretty big slide deck and covers some introductory R (there are some mistakes in that section which I will update in the coming days), and overview of the CDK and then sections on usage and applications of the rcdk and rpubchem packages. It certainly helped that I had a very friendly audience! During the course of the workshop I also learned a few things about R (thanks to Tobias Verbeke and Steffen). Given that about 40 people or so were exposed to the rcdk package, my (known) user base should hopefully increase :) It was nice to get a patch from Tobias during the workshop, which will be incorporated once I’m back home. It was also great to meet a number of people with whom I’d only had email or FriendFeed exchanges with in the past – including Chris Swain, Mark Rijnbeek, Duncan Hull, Nico Adams (though I didn’t realize it was him when I was speaking to him – sorry Nico!), Duan Lian and Syed Asad Rahman. I also got to briefly meet some of the ChEMBL folks (John and Patricia). Monday night we had a lovely workshop dinner at The Cricketer (Clavering). Many thanks to Gabriella Rustici and Dominic Clark for organizing this and inviting me to run the first day. The only downside of this trip? It was too short :) It would’ve been great to be able to stay a day or two more to have longer discussions with various groups.

In addition to the workshop, I visited Asad and his family in Cambridge for a fantastic dinner and much useful discussion. He’s done some excellent work on SMSD and showed me some of his recent work on enzyme classification and reaction mappings. I won’t say much more as he’s writing this up, except to say that it was quite impressive and I’m eagerly looking forward to seeing the writeups. Hopefully we’ll be able to do some joint work in the near future. Given the speed up that SMSD provides for graph isomorphism, I’m in the process of updating the CDK SMARTS parser to make use of it rather than the older UIT, which should improve SMARTS matching considerably. Down the road, the pharmacophore matching code will get a similar upgrade.

I was also able to squeeze in a day trip up to Harrogate, where I grew up. It was fun to see familiar streets and places after 23 years or so. It certainly didn’t hurt to also have some pretty amazing traditional English fare (Yorkshire curd tart at Bettys and the fish ‘n chips at Graveleys was fantastic).

2 thoughts on “Spreading the Word About R & Cheminformatics

  1. Aghilmort says:

    May want to check out nauty, & better yet, nice, which is part of SAGE

    http://www.sagemath.org/
    http://cs.anu.edu.au/~bdm/nauty/

  2. Asad says:

    “Nauty” is one of the excellent algorithm for predicting isomorphism! Does it handle maximum common subgraph(MCS), Isomorphism and subgraph isomorphism? Maybe, I missed something!

    I found this interesting…

    http://www.aaai.org/Papers/Symposia/Fall/2006/FS-06-02/FS06-02-007.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *