A key feature of high throughput screening (HTS) efforts is automation. The NCGC is no stranger to automation, with two Kalypsys robots and a variety of automated components such as liquid handlers and so on. But while the screen itself is automated, the transitions between subsequent steps are not. Thus, after a screen is complete, I will be notified that the data is located in some directory. I’ll then load up the data, process it and end up with a set of compounds for followup. I’d then send the list of compounds to be plated which would then be screened in a follow up assay.
In a number of situations, this approach is unavoidable as the data processing stage requires human intervention (plate corrections, switching controls, etc.). But in some situations, we can automate the whole process – primary screen, automated analysis & compound selection and secondary screen. Given that most screens at NCGC are dose response screens, we can refine an automated pipeline by processing individual plate series (i.e. a collection of plates representing a titration series) rather than waiting for all the plates to be completed. Another important point to note is that the different steps being considered here take different times. Thus screening a plate series might take 15 minutes, processing the resultant data and making selections would take 3 minutes and performing the secondary screen might take 10 minutes. Clearly the three steps have to proceed in the given order – but we don’t necessarily want to wait for each preceding step to be complete. In other words, we need the steps to proceed asynchronously, yet maintain temporal ordering.
One approach to automating such a process is the use of a message queue (MQ). The fundamental idea behind a MQ is that one creates a queue on some machine and then starts one or more processes (likely on some other machines) to send messages to the queue. These messages can then be retrieved by one or more listener processes. MQ systems provide a number of useful features beyond the core functionality of storing and distributing messages – these include message persistence, security policy, routing, batching and so on.
In our case, when a plate series is screened, the robot sends a message to the queue. Some process will be listening to the queue and when it sees a message, pulls it of the queue and processes the data from the screen for that plate series. Once processing is complete, the process sends another message to the queue (or another queue) from which yet another process (this one running on another robot) can pull it off and start the secondary screen on the selected compounds. Thus, as soon as a plate series is finished in the primary screen, we can start the processing and follow up, while the next plate series gets started. A message queue approach is also useful since messages can remain on the queue until the appropriate listener pulls them of for processing. A good queue system will ensure that such messages are delivered reliably and don’t get lost.
The diagram below highlights this approach. The solid lines represent the traditional workflow. Given that we’d manually process the screening data, we’d wait till all plate series are run. The dashed lines represent a message based workflow, in which we can process each plate series independently.
In the next few posts I’ll describe such a message queue based workflow that I’ve been working on these past few days. Currently it’s specific to a screen that we’re going to be running. The infrastructure is written in Java and makes use of Oracle Advanced Queue (AQ) to provide message queues and the facilities for receiving and sending message. I’ll describe a minimal implementation that makes use of Java Messaging Services (JMS) and the standard JMS message types and then follow on with an example using a custom message type that maps to a Oracle user defined type, allowing for more “object oriented” messages.