HCSConnect, SOAP and images – the sordid details

Over the past few months I’ve been trying to work with the HCSConnect web service API that Cellomics provides as a means to programmatically access the Arrayscan image and data store. While the API works for getting things such as plate names, I was unable to retrieve images. After much poking and some suggestions from Rossella Rispoli (ICR), I put together a solution. Which (I think) is essentially a kludge due to the fact that the WSDL generated by the HCSConnect API is buggy.

The API is based on SOAP (yuk!) and so the first thing to do is to autogenerate a client. Since I work with Java, I used the AXIS libraries. Importantly, it seems that the only combination that works is AXIS1 with the BasicHTTPBinding (SOAP 1.1) from the API. AXIS2 + WSBinding (SOAP 1.2) do not work – I don’t know why.

Assuming you’ve got the AXIS1 libraries somewhere, here’s the sequence of steps to get a SOAP client generated:

1
2
3
4
5
6
7
8
9
10
11
export AXIS_HOME=/path/to/axis
export AXIS_LIB=$AXIS_HOME/lib
export AXISCLASSPATH=$AXIS_LIB/axis.jar:$AXIS_LIB/commons-discovery-0.2.jar:$AXIS_LIB/commons-logging-1.0.4.jar:$AXIS_LIB/jaxrpc.jar:$AXIS_LIB/saaj.jar:$AXIS_LIB/log4j-1.2.8.jar:$AXIS_LIB/wsdl4j-1.5.1.jar

java -cp "$AXISCLASSPATH" org.apache.axis.wsdl.WSDL2Java -p gov.nih.ncgc.arrayscan -B http://api.host.name:2020/?wsdl

mkdir arrayscan
mv build.xml gov arrayscan
cd arrayscan
ant
mv "?wsdl.jar" HCSConnect-client-axis1.jar

You’ll probably want to change the package name. Also note the use of port 2020 – this corresponds to the HTTP service. The resultant JAR file is what you should add to your project as a dependency. This library lets you connect to the API and start pulling data.

But it will fail when you try to retrieve an image. And this because, the SOAP response that is returned when an image is requested provides the binary image data as an attachment. The autogenerated client code is unable to handle this (suggesting an issue in the WSDL specification for that method).

To get around this, I needed to intercept the SOAP response and extract the image bytes myself. This can be achieved by creating an implementation of GenericHandler that only deals with the response from the GetImage method of the HCSConnect API. The code below does this and when it sees such a response, it extracts the image bytes and stores it in a static variable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
public class GetImageMessageHandler extends GenericHandler {
 
    HandlerInfo headerInfo;
    public static byte[] imageBytes;
 
    public void init(HandlerInfo info) {
        headerInfo = info;
    }
 
    public QName[] getHeaders() {
        return headerInfo.getHeaders();
    }
 
    public boolean handleResponse(MessageContext context) {
 
        try {
 
            // get the soap header
            SOAPMessage message = ((org.apache.axis.MessageContext) context).getMessage();
            String opName = ((org.apache.axis.MessageContext) context).getOperation().getName();
            if (!opName.equals("GetImage")) return true;
 
            int nattach = 0;
            Iterator iter = message.getAttachments();
            while (iter.hasNext()) {
                Object next = iter.next();
                nattach++;
            }
            if (nattach != 1) throw new RuntimeException("If operation is GetImage, we must have 1 attachment");
 
            // Create transformer
            TransformerFactory tff = TransformerFactory.newInstance();
            Transformer tf = tff.newTransformer();
 
            // Get reply content
            Source sc = message.getSOAPPart().getContent();
 
            // Set output transformation
            StreamResult result = new StreamResult(System.out);
            tf.transform(sc, result);
            System.out.println();
 
            iter = message.getAttachments();
            while (iter.hasNext()) {
                AttachmentPart att = (AttachmentPart) iter.next();
                BufferedInputStream bis = new BufferedInputStream(att.getDataHandler().getInputStream());
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                int c;
                while ((c = bis.read()) != -1) baos.write(c);
                bis.close();
                imageBytes = baos.toByteArray();
            }
 
        } catch (Exception e) {
            throw new JAXRPCException(e);
        }
        return true;
    }
 
    public boolean handleRequest(MessageContext context) {
        // return true to continue message processing
        return true;
    }
}

With this in hand, code that want’s to retrieve the image can use it as follows

1
2
3
4
5
6
HCSConnectLocator csl = new HCSConnectLocator();
HandlerRegistry hr = csl.getHandlerRegistry();
List hc = hr.getHandlerChain(new QName("http://schemas.datacontract.org/2004/07/Thermo.Connect", "HTTPHCSConnect"));
HandlerInfo hi = new HandlerInfo();
hi.setHandlerClass(GetImageMessageHandler.class);
hc.add(hi);

and then instead of doing

1
Image image = csl.getHTTPHCSConnect().getImage(idr);

you’d do

1
2
3
4
5
6
7
try {
  csl.getHTTPHCSConnect().getImage(idr);
} catch (AxisFault e) {
  FileOutputStream imageOutFile = new FileOutputStream("img.jpg");
  imageOutFile.write(GetImageMessageHandler.imageBytes);
  imageOutFile.close();
}

This is somewhat inelegant as the use of the static variable in GetImageMessageHandler means that we can’t use this class in a multi-threaded environment. However, it appears that the AXIS API instantiates instances of the handler class itself, rather than accepting an instance, so I don’t see an easy way around this.

Visual pairwise comparison of distributions

While analysing some data from a dose respons screen, run across multiple cell lines, I need to visualize summarize curve data in a pairwise fashion. Specifically, I wanted to compaure area under the curve (AUC) values for the curve fits for the same compound between every pair of cell line. Given that an AUC needs a proper curve fit, this means that the number of non-NA AUCs is different for each cell line. As a result making  a scatter plot matrix (via plotmatrix) won’t do.

A more useful approach is to generate a matrix of density plots, such that each plot contains the distributions of AUCs from each pair of cell lines over laid on each other. It turns out that some data.frame wrangling and facet_grid makes this extremely easy.

Lets start with some random data, for 5 imaginary cell lines

1
2
3
4
5
6
7
8
library(ggplot2)
library(reshape)

tmp1 <- data.frame(do.call(cbind, lapply(1:5, function(x) {
  r <- rnorm(100, mean=sample(1:4, 1))
  r[sample(1:100, 20)] <- NA
  return(r)
})))

Next, we need to expand this into a form that lets us facet by pairs of variables

1
2
3
4
5
6
7
8
tmp2 <- do.call(rbind, lapply(1:5, function(i) {
  do.call(rbind, lapply(1:5, function(j) {
    r <- rbind(data.frame(var='D1', val=tmp1[,i]),
               data.frame(var='D2', val=tmp1[,j]))
    r <- data.frame(xx=names(tmp1)[i], yy=names(tmp1)[j], r)
    return(r)
  }))
}))

Finally, we can make the plot

1
2
3
4
ggplot(tmp2, aes(x=val, fill=var))+
  geom_density(alpha=0.2, position="identity")+
  theme(legend.position = "none")+
  facet_grid(xx ~ yy, scales='fixed')

Giving us the plot below.

I had initially asked this on StackOverflow where Arun provided a more elegant approach to composing the data.frame

Python, permission and forgiveness

This morning I was writing some Python code that needed to perform lookups on a very large map.

1
2
3
4
mapSize = 65000
amap = {}
for i in range(0,mapSize):
    amap['k%d' % (i)] = i

If a key did not exist in the map I needed to take some action. My initial effort performed a pre-emptive check to see whether the key existed in the map

1
2
3
4
5
6
7
8
9
query = ['k%d' % (x) for x in xrange(100)]
query.extend(['k%d' % (x) for x in xrange(mapSize+1, mapSize+100)])

def permission():
    for q in query:
        if q in amap.keys():
            val = amap[q]
        else:
            pass

Looking up 200 keys (half of which are present and half  are absent from the map) took 496 ms (based on Pythons’ timeit module with default settings). On the other hand, if we just try and go ahead and access the key and deal with its absence by handling with the exception, we get a significant improvement.

1
2
3
4
5
6
def forgiveness():
    for q in query:
        try:
            val = amap[q]
        except:
            pass

Specifically, 150 us – an improvement of 3306x.

So, as with many other things in life, it’s better to ask Python for forgiveness than  permission.

New version of fingerprint (3.4.9) – faster Dice similarity matrices

I’ve just pushed a new version of the fingerprint package that contains an update provided by Abhik Seal that significantly speeds up calculation of pairwise similarity matrices when using the Dice similarity method. A ran a simple comparison using different numbers of random fingerprints (1024 bits, with 512 bits set to one, randomly) and measured the time to evaluate the pairwise similarity matrix. As you can see from the figure alongside, the new code is significantly faster (with speed ups of 450x to 500x). The code to generate the timings is below – it probably should wrapped in a loop to multiple times for each set size.

1
2
3
4
5
fpls <- lapply(seq(10,300,by=10),
               function(i) sapply(1:i,
                                  function(x) random.fingerprint(1024, 512)))
times <- sapply(fpls,
                function(fpl) system.time(fp.sim.matrix(fpl, method='dice'))[3])

CINF Webinar: Practical cheminformatics workflows with mobile apps

I’m pleased to announce that the ACS Division of Chemical Information will be hosting a series of free webinars on topics related to chemical information. The webinars will be open to everybody and our first speaker in this series will be Dr. Alex Clark, who’ll be talking about cheminformatics workflows and mobile applications. More details below or at http://www.acscinf.org/about/news/20120918.php

Webinar: Practical cheminformatics workflows with mobile apps
Date: October 3, 2012
Time: 11 am Eastern time (US)

Abstract

In recent years smartphones and tablets have attained sufficient power and sophistication to replace conventional desktop and laptop computers for many tasks. Chemistry software is late to the party, but rapidly catching up. This webinar will explore some of the cheminformatics functionality that can currently be performed using mobile apps. A number of workflow scenarios will be discussed, such as: creating and maintaining chemical data (molecules, reactions, numbers & text); searching chemical databases and utilising the results; structure-aware lab notebooks; visualisation and structure-activity analysis; property calculation using remote webservices; and a multitude of ways to share data collaboratively, and integrate modular apps within distributed and heterogeneous workflows.

Speaker

Alex M. Clark graduated from the University of Auckland, New Zealand, with a Ph.D. in synthetic organometallic chemistry, then went on to work in computational chemistry. His chemistry background spans both the lab bench and development of software for a broad variety of 2D and 3D computer aided molecular design algorithms and user interfaces. He is the founder of Molecular Materials Informatics, Inc., which is dedicated to producing next-generation cheminformatics software for emerging platforms such as mobile devices and cloud computing environments.