Cheminformatics and Clam Chowder

The time has come to move again – though, in this case, it’s just a geographic move. From August I’ll be living in Manchester, CT (great cheeseburgers and lovely cycle routes) and will continue to work remotely for NCGC. I’ll be travelling to DC every month or so. The rest of the time I’ll be working from Connecticut.

Being new to the area, it’d be great to meet up over a beer, with people in the surrounding areas (NY/CT/RI) doing cheminformatics, predictive modeling and other life science related topics (any R user groups in the area?). If anybody’s interested, drop me a line (comment, mail or @rguha).

A New Round of Lightning Talks

With the 2011 Fall ACS meeting coming up in Denver next month, CINF will be hosting another round of lightning talks – 8 minutes to talk about anything related to cheminformatics and chemical information. As before, these talks won’t be managed via PACS, as a result of which we are taking short abstracts between July 14 and Aug 14.We hope that we’ll get to hear about interesting and recent stuff. Remember, this is meant to be a fun event so be creative! (You can see slides from the first run of this session last year).

The full announcement is below:

For the 2011 Fall meeting in Denver (Aug 28 – Sep 1), CINF will be running an experimental session of lightning talks – short, strictly timed talks. The session does not have a specific topic, however, all talks should be related to cheminformatics and chemical information. One of the key features of this session is that we will not be using the traditional ACS abstract submission system, since that system precludes the inclusion of recent work in the program.

So, since we will be accepting abstracts directly, the expectation is that they be about recent work and developments, rather than rehashes of year-old work. In addition, talks should not be verbal versions of posters submitted for this meeting. Given the short time limits we don’t expect great detail – but we are expecting compact and informative presentations.

That’s the challenge.

What

  • Talks should be no longer than 8 minutes in length. At 8 minutes, you will be asked to stop.
  • Use as many slides as you want, as long as you can finish in 8 minutes
  • Talks should not be rehashes of poster presentations
  • Talks will run back to back, and questions & discussion will be held of off until the end

If you haven’t participated in these types of talks before here are some suggestions:

  • No more than three slides for a 5 minute talk (but if you can pull of 20 slides in 8 minutes, more power to you)
  • Avoid slides with too much text (and don’t paste PDF’s of papers!)
  • A single chart per slide and make sure labels are readable at a distance

When

1:30pm, Wednesday, August 31st, 2011

Submissions run from July 14 to Aug 14

Where

Room 112, Colorado Convention Center

How

  • Send in an abstract of about 100 – 120 words to cinf.flash@gmail.com
  • We will let you know if you will be speaking by Aug 21 and we will need slide decks by Aug 24
  • You must be registered for the meeting
  • Note that the usual publication/copyright rules apply
  • We will encourage live blogging and tweets (if we have net access)

New Versions of rcdk & rcdklibs

With the recent stable release of the CDK (1.3.12) and the inclusion of the new rendering classes, I was able to make a new release of the rcdk (3.1.1) and rcdklibs (1.3.11) packages that support cheminformatics in R. They’ve been pushed to CRAN and should be visible in a day or two. The new features in the latest version of rcdk include

  • Directly evaluate molecular volume (based on group contributions) using get.volume
  • Generate fingerprints using the hybridization state
  • get.total.charge and get.total.formal.charge work sensibly
  • Added a function (copy.image.to.clipboard) that copies the 2D depiction of a molecule to the system clipboard in PNG format
  • Now, OS X users can view and copy molecule depictions. This is slower compared to the same operation on Windows or Linux since it involves shell’ing out via system. But it is better than not being able to view anything.

The CDK Volume Descriptor

Sometime back Egon implemented a simple group contribution based volume calculator and it made its way into the stable branch (1.4.x) today. As a result I put out a new version of the CDKDescUI which includes a descriptor that wraps the new volume calculator as well as the hybridization fingerprinter that Egon also implemented recently. The volume descriptor (based on the VABCVolume class) is one that has been missing for the some time (though the NumericalSurface class did return a volume, but it’s slow). This class is reasonably fast (10,000 molecules processed in 32 sec) and correlates well with the 2D and pseudo-3D volume descriptors from MOE (2008.10) as shown below. As expected the correlation is better with the 2D version of the descriptor (which is similar in nature to the lookup method used in the CDK version). The X-axis represents the CDK descriptor values.

New Version of fingerprint

I’ve submitted version 3.4.3 of the fingerprint package to CRAN, so it should be available in a day or two. It’s an R package that lets you read in (chemical structure) fingerprint data from a variety of sources (CDK, MOE, BCI etc) and perform a variety of operations (bitwise, similarity, etc.) and visualizations on them.

The two main additions to this version are

  • Read support for the new FPS fingerprint format described by Andrew Dalke at the chemfp project. Note, it currently discards some of header information
  • The fingerprint class now has a field, misc, (a list) that allows one to read in extra, arbitrary data that might be provided along with a fingerprint. Exactly what gets stored in this field depends on the line function used to read in the fingerprint data. Currently only the FPS parser returns extra data (when available) in this field.

As always, you can get the package source directly from the Github repository.