Search Result for rest — 106 articles
I came across an ASAP paper today describing substructure searching in Oracle databases. The paper comes from the folks at J & J and is part of their series of papers on the ABCD platform. Performing substructure searches in databases is certainly not a new topic and various products are out there that support this in Oracle (as well as other RDBMSs). The paper describes how the ABCD system does this using a combination of structure-derived hash keys and an inverted bitset based index and discuss their implementation as an Oracle cartridge. They provide an interesting discussion of how their implementation supports Cost Based Optimization of SQL queries involving substructure search. The authors run a number of benchmarks. In terms of comparative benchamrks they compare the performance (i.e., screening efficiency) of their hashed keys versus MACCS keys, CACTVS keys and OpenBabel FP2 fingerprints. Their results indicate that the screening step is a key bottleneck in the query process and that their hash key is generally more selective than the others.
Unfortunately, what would have been interesting but was not provided was a comparison of the performance at the Oracle query level with other products such as JChem Cartridge and OrChem. Furthermore, the test case is just under a million molecules from Golovin & Henrick – the entire dataset (not just the keys) could probably reside in-memory on todays servers. How does the system perform when say faced with PubChem (34 million molecules)? The paper mentions a command line implementation of their search procedure, but as far as I can tell, the Oracle cartridge is not available.
The ABCD system has many useful and interesting features. But as with the other publications on this system, this paper is one more in the line of “Papers About Systems You Can’t Use or Buy“. Unfortunate.
The time has come to move again – though, in this case, it’s just a geographic move. From August I’ll be living in Manchester, CT (great cheeseburgers and lovely cycle routes) and will continue to work remotely for NCGC. I’ll be travelling to DC every month or so. The rest of the time I’ll be working from Connecticut.
Being new to the area, it’d be great to meet up over a beer, with people in the surrounding areas (NY/CT/RI) doing cheminformatics, predictive modeling and other life science related topics (any R user groups in the area?). If anybody’s interested, drop me a line (comment, mail or @rguha).
With the 2011 Fall ACS meeting coming up in Denver next month, CINF will be hosting another round of lightning talks – 8 minutes to talk about anything related to cheminformatics and chemical information. As before, these talks won’t be managed via PACS, as a result of which we are taking short abstracts between July 14 and Aug 14.We hope that we’ll get to hear about interesting and recent stuff. Remember, this is meant to be a fun event so be creative! (You can see slides from the first run of this session last year).
The full announcement is below:
For the 2011 Fall meeting in Denver (Aug 28 – Sep 1), CINF will be running an experimental session of lightning talks – short, strictly timed talks. The session does not have a specific topic, however, all talks should be related to cheminformatics and chemical information. One of the key features of this session is that we will not be using the traditional ACS abstract submission system, since that system precludes the inclusion of recent work in the program.
So, since we will be accepting abstracts directly, the expectation is that they be about recent work and developments, rather than rehashes of year-old work. In addition, talks should not be verbal versions of posters submitted for this meeting. Given the short time limits we don’t expect great detail – but we are expecting compact and informative presentations.
That’s the challenge.
- Talks should be no longer than 8 minutes in length. At 8 minutes, you will be asked to stop.
- Use as many slides as you want, as long as you can finish in 8 minutes
- Talks should not be rehashes of poster presentations
- Talks will run back to back, and questions & discussion will be held of off until the end
If you haven’t participated in these types of talks before here are some suggestions:
- No more than three slides for a 5 minute talk (but if you can pull of 20 slides in 8 minutes, more power to you)
- Avoid slides with too much text (and don’t paste PDF’s of papers!)
- A single chart per slide and make sure labels are readable at a distance
1:30pm, Wednesday, August 31st, 2011
Submissions run from July 14 to Aug 14
Room 112, Colorado Convention Center
- Send in an abstract of about 100 – 120 words to email@example.com
- We will let you know if you will be speaking by Aug 21 and we will need slide decks by Aug 24
- You must be registered for the meeting
- Note that the usual publication/copyright rules apply
- We will encourage live blogging and tweets (if we have net access)
Over the last few months I’ve been getting involved in the informatics & data mining aspects of high content screening. While I haven’t gotten into image analysis itself (there’s a ton of good code and tools already out there), I’ve been focusing on managing image data and meta-data and asking interesting questions of the voluminuous, high-dimensional data that is generated by these techniques.
One of our platforms is ImageXpress from Molecular Devices, which stores images in a file-based image store and meta data and numerical image features in an Oracle database. While they do provide an API to interact with the database it’s a Windows only DLL. But since much of modeling requires I access the data from R, I needed a more flexible solution.
So, I’ve put together an R package that allows one to access numeric image data (i.e., descriptors) and images themselves. It depends on the ROracle package (which in turns requires an Oracle client installation).
Currently the functionality is relatively limited, focusing on my common tasks. Thus for example, given assay plate barcodes, we can retrieve the assay ids that the plate is associated with and then for a given assay, obtain the cell-level image parameter data (or optionally, aggregate it to well-level data). This task is easily parallelizable – in fact when processing a high content RNAi screen, I make use of snow to speed up the data access and processing of 50 plates.
con <- get.connection(user='foo', passwd='bar', sid='baz')
plate.barcode <- 'XYZ1023'
plate.id <- get.plates(con, plate.barcode)
## multiple analyses could be run on the same plate - we need
## to get the correct one (MX uses 'assay' to refer to an analysis run)
## so we first get details of analyses without retrieving the actual data
details <- get.assay.by.barcode(con, barcode=plate.barcode, dry=TRUE)
details <- subset(ret, PLATE_ID == plate.id & SETTINGS_NAME == assay.name)
assay.id <- details$ASSAY_ID
## finally, get the analysis data, using median to aggregate cell-level data
hcs.data <- get.assay(con, assay.id, aggregate.func=median, verbose=FALSE, na.rm=TRUE)
Alternatively, given a plate id (this is the internal MetaXpress plate id) and a well location, one can obtain the path to the relevant image(s). With the images in hand, you could use EBImage to perform image processing entirely in R.
## will want to set IMG.STORE.LOC to point to your image store
con <- get.connection(user='foo', passwd='bar', sid='baz')
plate.barcode <- 'XYZ1023'
plate.id <- get.plates(con, plate.barcode)
get.image.path(con, plate.id, 4, 4) ## get images for all sites & wavelengths
Currently, you cannot get the internal plate id based on the user assigned plate name (which is usually different from the barcode). Also the documentation is non-existant, so you need to explore the package to learn the functions. If there’s interest I’ll put in Rd pages down the line. As a side note, we also have a Java interface to the MetaXpress database that is being used to drive a REST interface to make our imaging data accessible via the web.
Of course, this is all specific to the ImageXpress platform – we have others such as InCell and Acumen. To have a comprehensive solution for all our imaging, I’m looking at the OME infrastructure as a means of, at the very least, have a unified interface to the images and their meta data.
The last few days I’ve been at the EBI, attending the Molecular Informatics Open Source Software (MIOSS) workshop. As part of this trip to the UK, I’ve also had the opportunity to present some of the work my colleagues and I have done at the NCTT – thanks to Mark Forster for the invitation to speak at Syngenta and to John Chambers for having me speak to the ChEMBL group. At the workshop I presented my work on cheminformatics in R.
The focus of the workshop was to bring OSS developers and users from industry and academica/government together to hear about a variety of projects and discuss issues underlying the development and use of these projects. There were some very nice presentations – I won’t go into too much detail but some highlights for me included
- Kevin Lawson (Syngenta) presented his work on LICSS – integrating the CDK with Excel. While I’ not a fan of Excel, it’s a necessary evil. I was quite surprised at the performance he acheived for substructure searches within Excel and the ability to access various functionalities of the CDK as Excel functions. While it probably won’t replace Accord or ChemOffice right now, it’s something to take a look at.
- Mike Bodkin (Lilly) spoke about the use of KNIME at Lilly. They have built up an extensive collection of commercial and OSS nodes and it’s clear that KNIME is capable of giving Pipeline Pilot a run for its money. Thorsten Mienl then spoke of the OSS development of KNIME, and mentioned that they now support a collection of HCS and image analysis nodes (courtesy MPI Dresden). This is quite interesting, given that we’re ramping up our HCS capabilities at the NCTT
- Hans de Winter of Silicos spoke about the tools and services that their company has produced on top of OpenBabel (and contributed back to the community). Quite encouraging to see a cheminformatics company making money of the OSS stack
- Greg Landrum spoke about RDKit, presenting the RDKit based catridge for Postregsql. He showed some nice performance numbers and it was nice to see that they had gotten the coders who implemented the GiST indexing mechanisms to implement a GiST index for binary fingerprints.
In addition to these, there were other talks on Openbabel, Cinfony, Taverna, fpocket and others. While I’ve known about many of these projects it was useful to learn some of the details from the developers themselves.
A number of issues surrounding OSS development and use were discussed. For example, community development was regarded as a key factor in the success of OSS projects. Erik Lindahl of GROMACS fame, spoke about the development model of GROMACS and how important their success has been due to community involvement. Some other issues included the importance (and lack of) good documentation, what makes people contribute to OSS and so on.
The fact that industry participation was about 50% of group was nice. And a number of industry-related issues also arose. For example, there were several discussion of business models based around OSS and how they can feed back into OSS projects. A commen thread seemed to be that service and customization of OSS are good approaches to building businesses around the OSS stack, Silicos and Eagle Genomics being two prime examples.
The fact that there are industry users of OSS as well as industry members contributing back to OSS projects was very encouraging. An idea supported by a number of participants was some form of web site / wiki where such contributors and users could list themselves. (IMO, the Blue Obelisk wiki, could be a candidate for this type of thing). Sure, there’d be usually corporate and legal barriers to this type of thing, but if done would have a number of benefits – encouragement for project developers and easily viewable precedent that would encourage other companies to use or participate in OSS projects, resulting in a positive feedback loop. With various pre-competitive collaboration efforts (e.g., Pistoia Alliance) popping up in the pharma industry, this is certainly possible.
Finally, it’s always good to meet up with old friends and also meet people whom I’ve only known over email. The social aspects of the workshop were very nice – helped greatly by excellent food and drink! Thanks to Mark for putting together a great meeting.