Accessing High Content Data from R

Over the last few months I’ve been getting involved in the informatics & data mining aspects of high content screening. While I haven’t gotten into image analysis itself (there’s a ton of good code and tools already out there), I’ve been focusing on managing image data and meta-data and asking interesting questions of the voluminuous, high-dimensional data that is generated by these techniques.

One of our platforms is ImageXpress from Molecular Devices, which stores images in a file-based image store and meta data and numerical image features in an Oracle database. While they do provide an API to interact with the database it’s a Windows only DLL. But since much of modeling requires I access the data from R, I needed a more flexible solution.

So, I’ve put together an R package that allows one to access numeric image data (i.e., descriptors) and images themselves. It depends on the ROracle package (which in turns requires an Oracle client installation).

Currently the functionality is relatively limited, focusing on my common tasks. Thus for example, given assay plate barcodes, we can retrieve the assay ids that the plate is associated with and then for a given assay, obtain the cell-level image parameter data (or optionally, aggregate it to well-level data). This task is easily parallelizable – in fact when processing a high content RNAi screen, I make use of snow to speed up the data access and processing of 50 plates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

library(ncgchcs)
con <- get.connection(user='foo', passwd='bar', sid='baz')
plate.barcode <- 'XYZ1023'
plate.id <- get.plates(con, plate.barcode)

## multiple analyses could be run on the same plate - we need
## to get the correct one (MX uses 'assay' to refer to an analysis run)
## so we first get details of analyses without retrieving the actual data
details <- get.assay.by.barcode(con, barcode=plate.barcode, dry=TRUE)
details <- subset(ret, PLATE_ID == plate.id & SETTINGS_NAME == assay.name)
assay.id <- details$ASSAY_ID

## finally, get the analysis data, using median to aggregate cell-level data
hcs.data <- get.assay(con, assay.id, aggregate.func=median, verbose=FALSE, na.rm=TRUE)

Alternatively, given a plate id (this is the internal MetaXpress plate id) and a well location, one can obtain the path to the relevant image(s). With the images in hand, you could use EBImage to perform image processing entirely in R.

1
2
3
4
5
6

library(ncgchcs)
## will want to set IMG.STORE.LOC to point to your image store
con <- get.connection(user='foo', passwd='bar', sid='baz')
plate.barcode <- 'XYZ1023'
plate.id <- get.plates(con, plate.barcode)
get.image.path(con, plate.id, 4, 4) ## get images for all sites & wavelengths

Currently, you cannot get the internal plate id based on the user assigned plate name (which is usually different from the barcode). Also the documentation is non-existant, so you need to explore the package to learn the functions. If there’s interest I’ll put in Rd pages down the line. As a side note, we also have a Java interface to the MetaXpress database that is being used to drive a REST interface to make our imaging data accessible via the web.

Of course, this is all specific to the ImageXpress platform – we have others such as InCell and Acumen. To have a comprehensive solution for all our imaging, I’m looking at the OME infrastructure as a means of, at the very least, have a unified interface to the images and their meta data.

So much to do, so little time

Trying to squeeze sense out of chemical data

Leave a Reply Cancel reply