While contributing to a book chapter on high content screening I came across the problem of characterizing screen quality. In a traditional assay development scenario the Z factor (or Z’) is used as one of the measures of assay performance (using the positive and negative control samples). The definition of Z’ is based on a […]
Competitive Predictive Modeling – How Useful is it?
While at the ACS National Meeting in Philadelphia I attended a talk by David Thompson of Boehringer Ingelheim (BI), where he spoke about a recent competition BI sponsored on Kaggle – a web site that hosts data mining competitions. In this instance, BI provided a dataset that contained only object identifiers and about 1700 numerical […]
Chunking lists in R
A common task for is to run database queries on gene symbols or compound identifiers. This involves constructing an SQL query as a string and sending that off to the database. In the case of the ROracle package, the query strings are limited to a 1000 (?) or so characters. This means that directly querying […]
Software for the “Federation of Independent Scientists”
A few days back, Derek Lowe posted a comment from a reader who suggested a way to approach the current employment challenges in the pharmaceutical industry would be the formation of a Federation of Independent Scientists. Such a federation would be open to consultants, small companies etc and would use its size to obtain group […]
I’d Rather Be … Reverse Engineering
Gamification is a hot topic and companies such as Tunedit and Kaggle are succesfully hosting a variety of data mining competitions. These competitions employ data from a variety of domains such as bond trading, essay scoring and so on. Recently, both platforms have hosted a QSAR challenge (though not officially denoted as such). The most […]