So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘Uncategorized’ Category

Cheminformatics and Clam Chowder

with one comment

The time has come to move again – though, in this case, it’s just a geographic move. From August I’ll be living in Manchester, CT (great cheeseburgers and lovely cycle routes) and will continue to work remotely for NCGC. I’ll be travelling to DC every month or so. The rest of the time I’ll be working from Connecticut.

Being new to the area, it’d be great to meet up over a beer, with people in the surrounding areas (NY/CT/RI) doing cheminformatics, predictive modeling and other life science related topics (any R user groups in the area?). If anybody’s interested, drop me a line (comment, mail or @rguha).

Written by Rajarshi Guha

July 25th, 2011 at 2:35 am

Posted in Uncategorized

Accessing High Content Data from R

without comments

Over the last few months I’ve been getting involved in the informatics & data mining aspects of high content screening. While I haven’t gotten into image analysis itself (there’s a ton of good code and tools already out there), I’ve been focusing on managing image data and meta-data and asking interesting questions of the voluminuous, high-dimensional data that is generated by these techniques.

One of our platforms is ImageXpress from Molecular Devices, which stores images in a file-based image store and meta data and numerical image features in an Oracle database. While they do provide an API to interact with the database it’s a Windows only DLL. But since much of modeling requires I access the data from R, I needed a more flexible solution.

So, I’ve put together an R package that allows one to access numeric image data (i.e., descriptors) and images themselves. It depends on the ROracle package (which in turns requires an Oracle client installation).

Currently the functionality is relatively limited, focusing on my common tasks. Thus for example, given assay plate barcodes, we can retrieve the assay ids that the plate is associated with and then for a given assay, obtain the cell-level image parameter data (or optionally, aggregate it to well-level data). This task is easily parallelizable – in fact when processing a high content RNAi screen, I make use of snow to speed up the data access and processing of 50 plates.

con <- get.connection(user='foo', passwd='bar', sid='baz')
plate.barcode <- 'XYZ1023' <- get.plates(con, plate.barcode)

## multiple analyses could be run on the same plate - we need
## to get the correct one (MX uses 'assay' to refer to an analysis run)
## so we first get details of analyses without retrieving the actual data
details <-, barcode=plate.barcode, dry=TRUE)
details <- subset(ret, PLATE_ID == & SETTINGS_NAME == <- details$ASSAY_ID

## finally, get the analysis data, using median to aggregate cell-level data <-  get.assay(con,, aggregate.func=median, verbose=FALSE, na.rm=TRUE)

Alternatively, given a plate id (this is the internal MetaXpress plate id) and a well location, one can obtain the path to the relevant image(s). With the images in hand, you could use EBImage to perform image processing entirely in R.

## will want to set IMG.STORE.LOC to point to your image store
con <- get.connection(user='foo', passwd='bar', sid='baz')
plate.barcode <- 'XYZ1023' <- get.plates(con, plate.barcode)
get.image.path(con,, 4, 4) ## get images for all sites & wavelengths

Currently, you cannot get the internal plate id based on the user assigned plate name (which is usually different from the barcode). Also the documentation is non-existant, so you need to explore the package to learn the functions. If there’s interest I’ll put in Rd pages down the line. As a side note, we also have a Java interface to the MetaXpress database that is being used to drive a REST interface to make our imaging data accessible via the web.

Of course, this is all specific to the ImageXpress platform – we have others such as InCell and Acumen. To have a comprehensive solution for all our imaging, I’m looking at the OME infrastructure as a means of, at the very least, have a unified interface to the images and their meta data.

Written by Rajarshi Guha

May 27th, 2011 at 5:01 am

Posted in Uncategorized,software

Tagged with , , ,

ICCS 2011

without comments

A few openings are left for the International Conference on Chemical Structures (ICCS)

A little less than 40 days left until the 9th International Conference on Chemical Structures (ICCS) starts in Noordwijkerhout, The Netherlands. The conference will focus on the latest scientific and technological developments in cheminformatics and related areas in six plenary sessions:

o Cheminformatics
o Structure-Activity and Structure-Property Prediction
o Structure-Based Drug Design and Virtual Screening
o Analysis of Large Chemistry Spaces
o Integrated Chemical Information
o Dealing with Biological Complexity

34 scientific lectures and 80 posters in two poster sessions will present applications and case studies as well as method development and algorithmic work in these areas. The program will open with a presentation by Engelbert Zass, ETH Zürich who has been awarded the CSA Trust Mike Lynch Award on the occasion of the 9th ICCS. We invite you to have a look at the scientific program which is now available at the website

In addition to the scientific program there will be a commercial exhibition with 16 leading cheminformatics software suppliers. The participation of scientists from more than 20 countries will make this a truly international event with ample opportunities to networks and discuss science.

Free workshops will be offered before and after the official conference program by BioSolveIT (, The Chemical Computing Group (, Tripos (, and Accelrys (

On Wednesday afternoon there is a sailing cruise on the IJsselmeer on two traditional sailing boats. They will leave from the scenic Muiderslot castle, and then sail to the picturesque fishing village Volendam where the old village can be explored. A banquet dinner will be served on the boats on the way back.

If you are planning to attend, we encourage you to register as soon as possible through the conference web site:

We are looking forward to meeting with you all in Noordwijkerhout.

Keith T Taylor, ICCS Chair
Markus Wagener, ICCS Chair

Written by Rajarshi Guha

April 27th, 2011 at 12:53 pm

Posted in Uncategorized

Tagged with ,

2nd Call for Papers – ICCS, 2011

without comments

This has already been posted on some mailing lists, but one more place can’t hurt. The International Conference on Chemical Structures (ICCS) is coming up in June, 2011 at Noordwijkerhout, The Netherlands. I’m on the scientific advisory board and am planning to attend this meeting, as the topics being covered look pretty interesting, especially those focusing on ‘systems’ aspects of cheminformatics and bioinformatics. The abstract submission deadline is January 31, 2011.

C A L L   F O R   P A P E R S
9th International Conference on Chemical Structures
NH Leeuwenhorst Conference Hotel,
Noordwijkerhout, The Netherlands

5-9 June 2011

Visit the conference website at for
more information.

The 9th International Conference on Chemical Structures (ICCS) is
seeking presentations of novel research and emerging technologies for
the following plenary sessions:

o Cheminformatics
> advances in structure representation
> reaction handling and electronic lab notebooks (ELNs)
> molecular similarity and diversity
> chemical information visualization

o Structure-Activity and Structure-Property Prediction
> graphical methods for SAR analysis
> industrialized and large-scale model building
> multi-property prediction and multi-objective optimization

o Structure-Based Drug Design and Virtual Screening
> new docking and scoring approaches
> improved understanding of protein-ligand interactions
> pharmacophore definition and search
> modeling of challenging targets

o Analysis of Large Chemistry Spaces
> mining of chemical literature and patents
> design, profiling and comparison of compound collections and screening sets
> machine learning and knowledge extraction from databases

o Integrated Chemical Information
> advances in chemogenomics
> integration of medical and biological information
> semantic technologies as a driver of integration
> translational informatics

o Dealing with Biological Complexity
> analysis and prediction of poly-pharmacology
> in-silico analysis of toxicology, drug safety, and adverse events
> pathways and biological networks
> druggability of targets

Before and after the official conference program free workshops will be
offered by several companies including BioSolveIT (
and the Chemical Computing Group (

Joint Organizers:
o Division of Chemical Information of the American Chemical Society
o Chemical Structure Association Trust (CSA Trust)
o Division of Chemical Information and Computer Science of the
Chemical Society of Japan (CSJ)
o Chemistry-Information-Computer Division of the Society of German
Chemists (GDCh)
o Royal Netherlands Chemical Society (KNCV)
o Chemical Information Group of the Royal Society of Chemistry (RSC)
o Swiss Chemical Society (SCS)

We encourage the submission of papers on both applications and case
studies as well as on method development and algorithmic work. The final
program will be a balance of these two aspects.

From the submissions the program committee and the scientific advisory
board will select about 30 papers for the plenary sessions. All submissions
that cannot be included in the plenary sessions will automatically be
considered for the poster session.

Contributions can be submitted for any of the above and related areas,
but we also welcome contributions in any aspect of the computer handling
of chemical structure information, such as:

o automatic structure elucidation
o combinatorial chemistry, diversity analysis
o web technology and its effect on chemical information
o electronic publishing
o MM or QM/MM simulations
o practical free energy calculations
o modeling of ADME properties
o material sciences
o analysis and prediction of crystal structures
o grid and cloud computing in cheminformatics

Visit the conference website at for
more information, including details on procedures for online abstract
submission and conference registration.

The deadline for the submission of abstracts is 31 January 2011.

We hope to see you in Noordwijkerhout.

Keith T Taylor, ICCS Chair
Markus Wagener, ICCS Co-Chair

Written by Rajarshi Guha

December 11th, 2010 at 3:22 pm

Posted in Uncategorized

Tagged with ,

Job Openings at the NCGC

without comments

I’ve been at the NCGC for a little more than a year and I can say that it’s a great place to work – smart people, cutting edge projects in chemical genomics and chemical biology, opportunities to be involved in all aspects of HTS projects and fresh data (lots of it). Now there’s opportunities for others to join the fun!

Sometime back, my colleague Trung posted an ad for a software engineer position, primarily working on our chemogenomics data application. Now, we’re also looking for a research informatics scientist. See the detailed ad for more information. For both positions, see the ads themselves for contact details. If you’d like to chat face to face I’ll be at the ACS in Boston this month, so drop me a line and we can chat in Boston.

Written by Rajarshi Guha

August 3rd, 2010 at 11:50 pm

Posted in Uncategorized

Tagged with ,