So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for February, 2013

HCSConnect, SOAP and images – the sordid details

without comments

Over the past few months I’ve been trying to work with the HCSConnect web service API that Cellomics provides as a means to programmatically access the Arrayscan image and data store. While the API works for getting things such as plate names, I was unable to retrieve images. After much poking and some suggestions from Rossella Rispoli (ICR), I put together a solution. Which (I think) is essentially a kludge due to the fact that the WSDL generated by the HCSConnect API is buggy.

The API is based on SOAP (yuk!) and so the first thing to do is to autogenerate a client. Since I work with Java, I used the AXIS libraries. Importantly, it seems that the only combination that works is AXIS1 with the BasicHTTPBinding (SOAP 1.1) from the API. AXIS2 + WSBinding (SOAP 1.2) do not work – I don’t know why.

Assuming you’ve got the AXIS1 libraries somewhere, here’s the sequence of steps to get a SOAP client generated:

1
2
3
4
5
6
7
8
9
10
11
export AXIS_HOME=/path/to/axis
export AXIS_LIB=$AXIS_HOME/lib
export AXISCLASSPATH=$AXIS_LIB/axis.jar:$AXIS_LIB/commons-discovery-0.2.jar:$AXIS_LIB/commons-logging-1.0.4.jar:$AXIS_LIB/jaxrpc.jar:$AXIS_LIB/saaj.jar:$AXIS_LIB/log4j-1.2.8.jar:$AXIS_LIB/wsdl4j-1.5.1.jar

java -cp "$AXISCLASSPATH" org.apache.axis.wsdl.WSDL2Java -p gov.nih.ncgc.arrayscan -B http://api.host.name:2020/?wsdl

mkdir arrayscan
mv build.xml gov arrayscan
cd arrayscan
ant
mv "?wsdl.jar" HCSConnect-client-axis1.jar

You’ll probably want to change the package name. Also note the use of port 2020 – this corresponds to the HTTP service. The resultant JAR file is what you should add to your project as a dependency. This library lets you connect to the API and start pulling data.

But it will fail when you try to retrieve an image. And this because, the SOAP response that is returned when an image is requested provides the binary image data as an attachment. The autogenerated client code is unable to handle this (suggesting an issue in the WSDL specification for that method).

To get around this, I needed to intercept the SOAP response and extract the image bytes myself. This can be achieved by creating an implementation of GenericHandler that only deals with the response from the GetImage method of the HCSConnect API. The code below does this and when it sees such a response, it extracts the image bytes and stores it in a static variable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
public class GetImageMessageHandler extends GenericHandler {
 
    HandlerInfo headerInfo;
    public static byte[] imageBytes;
 
    public void init(HandlerInfo info) {
        headerInfo = info;
    }
 
    public QName[] getHeaders() {
        return headerInfo.getHeaders();
    }
 
    public boolean handleResponse(MessageContext context) {
 
        try {
 
            // get the soap header
            SOAPMessage message = ((org.apache.axis.MessageContext) context).getMessage();
            String opName = ((org.apache.axis.MessageContext) context).getOperation().getName();
            if (!opName.equals("GetImage")) return true;
 
            int nattach = 0;
            Iterator iter = message.getAttachments();
            while (iter.hasNext()) {
                Object next = iter.next();
                nattach++;
            }
            if (nattach != 1) throw new RuntimeException("If operation is GetImage, we must have 1 attachment");
 
            // Create transformer
            TransformerFactory tff = TransformerFactory.newInstance();
            Transformer tf = tff.newTransformer();
 
            // Get reply content
            Source sc = message.getSOAPPart().getContent();
 
            // Set output transformation
            StreamResult result = new StreamResult(System.out);
            tf.transform(sc, result);
            System.out.println();
 
            iter = message.getAttachments();
            while (iter.hasNext()) {
                AttachmentPart att = (AttachmentPart) iter.next();
                BufferedInputStream bis = new BufferedInputStream(att.getDataHandler().getInputStream());
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                int c;
                while ((c = bis.read()) != -1) baos.write(c);
                bis.close();
                imageBytes = baos.toByteArray();
            }
 
        } catch (Exception e) {
            throw new JAXRPCException(e);
        }
        return true;
    }
 
    public boolean handleRequest(MessageContext context) {
        // return true to continue message processing
        return true;
    }
}

With this in hand, code that want’s to retrieve the image can use it as follows

1
2
3
4
5
6
HCSConnectLocator csl = new HCSConnectLocator();
HandlerRegistry hr = csl.getHandlerRegistry();
List hc = hr.getHandlerChain(new QName("http://schemas.datacontract.org/2004/07/Thermo.Connect", "HTTPHCSConnect"));
HandlerInfo hi = new HandlerInfo();
hi.setHandlerClass(GetImageMessageHandler.class);
hc.add(hi);

and then instead of doing

1
Image image = csl.getHTTPHCSConnect().getImage(idr);

you’d do

1
2
3
4
5
6
7
try {
  csl.getHTTPHCSConnect().getImage(idr);
} catch (AxisFault e) {
  FileOutputStream imageOutFile = new FileOutputStream("img.jpg");
  imageOutFile.write(GetImageMessageHandler.imageBytes);
  imageOutFile.close();
}

This is somewhat inelegant as the use of the static variable in GetImageMessageHandler means that we can’t use this class in a multi-threaded environment. However, it appears that the AXIS API instantiates instances of the handler class itself, rather than accepting an instance, so I don’t see an easy way around this.

Written by Rajarshi Guha

February 13th, 2013 at 11:30 pm

Posted in software

Tagged with , , , ,

Visual pairwise comparison of distributions

without comments

While analysing some data from a dose respons screen, run across multiple cell lines, I need to visualize summarize curve data in a pairwise fashion. Specifically, I wanted to compaure area under the curve (AUC) values for the curve fits for the same compound between every pair of cell line. Given that an AUC needs a proper curve fit, this means that the number of non-NA AUCs is different for each cell line. As a result making  a scatter plot matrix (via plotmatrix) won’t do.

A more useful approach is to generate a matrix of density plots, such that each plot contains the distributions of AUCs from each pair of cell lines over laid on each other. It turns out that some data.frame wrangling and facet_grid makes this extremely easy.

Lets start with some random data, for 5 imaginary cell lines

1
2
3
4
5
6
7
8
library(ggplot2)
library(reshape)

tmp1 <- data.frame(do.call(cbind, lapply(1:5, function(x) {
  r <- rnorm(100, mean=sample(1:4, 1))
  r[sample(1:100, 20)] <- NA
  return(r)
})))

Next, we need to expand this into a form that lets us facet by pairs of variables

1
2
3
4
5
6
7
8
tmp2 <- do.call(rbind, lapply(1:5, function(i) {
  do.call(rbind, lapply(1:5, function(j) {
    r <- rbind(data.frame(var='D1', val=tmp1[,i]),
               data.frame(var='D2', val=tmp1[,j]))
    r <- data.frame(xx=names(tmp1)[i], yy=names(tmp1)[j], r)
    return(r)
  }))
}))

Finally, we can make the plot

1
2
3
4
ggplot(tmp2, aes(x=val, fill=var))+
  geom_density(alpha=0.2, position="identity")+
  theme(legend.position = "none")+
  facet_grid(xx ~ yy, scales='fixed')

Giving us the plot below.

I had initially asked this on StackOverflow where Arun provided a more elegant approach to composing the data.frame

Written by Rajarshi Guha

February 10th, 2013 at 3:03 pm

Python, permission and forgiveness

with one comment

This morning I was writing some Python code that needed to perform lookups on a very large map.

1
2
3
4
mapSize = 65000
amap = {}
for i in range(0,mapSize):
    amap['k%d' % (i)] = i

If a key did not exist in the map I needed to take some action. My initial effort performed a pre-emptive check to see whether the key existed in the map

1
2
3
4
5
6
7
8
9
query = ['k%d' % (x) for x in xrange(100)]
query.extend(['k%d' % (x) for x in xrange(mapSize+1, mapSize+100)])

def permission():
    for q in query:
        if q in amap.keys():
            val = amap[q]
        else:
            pass

Looking up 200 keys (half of which are present and half  are absent from the map) took 496 ms (based on Pythons’ timeit module with default settings). On the other hand, if we just try and go ahead and access the key and deal with its absence by handling with the exception, we get a significant improvement.

1
2
3
4
5
6
def forgiveness():
    for q in query:
        try:
            val = amap[q]
        except:
            pass

Specifically, 150 us – an improvement of 3306x.

So, as with many other things in life, it’s better to ask Python for forgiveness than  permission.

Written by Rajarshi Guha

February 7th, 2013 at 4:19 pm

Posted in software

Tagged with