R and Oracle

It’s been a while since my last post, but I’m getting up to speed at work. It’s been less than a month, but there’s already a ton of cool stuff going on. One of the first things I’ve been getting to grips with is the data infrastructure at the NCGC, which is based around Oracle. […]

Chemistry, Clouds, Collaboration (Part 1)

There’s been an interesting discussion sparked by Deepaks post, asking why there is a much smaller showing of chemists and chemistry applications in the cloud compared to other life science areas. This post led to a FriendFeed thread that raised a number of issues. At a high level one can easily point out factors such […]

Annotating Bioassays

I’ve been working for some time with the PubChem Bioassay collection – a set of 1293 assays that cover a range of techniques (enzymatic, phenotypic etc.), targets and sizes (from 20 molecules to 200,000 molecules). In addition, some assays are primary, high-throughput assays whereas a number of them are smaller, confirmatory assays. While an extremely […]

Quick Comments on an Analysis of Antithrombotics

Joerg has made a nice blog post on the use of Open Source software and data to analyse the occurence of antithrombotics. More specifically he was trying to answer the question Which XRay ligands are closest to the Fontaine et al. structure-activity relationship data for allowing structure-based drug design? Using Blue Obelisk tools and ChemSpider […]

Brute Force – Inelegant, But Sometimes Useful

A few days back I posted on improving query times in Pub3D by going from a monolithic database (17M rows), to a partitioned version (~ 3M rows in 6 separate databases) and then performing queries in parallel. I also noted that we were improving query times by making use of an R-tree spatial index. Andrew […]

So much to do, so little time

Trying to squeeze sense out of chemical data

R and Oracle

Annotating Bioassays

Quick Comments on an Analysis of Antithrombotics

Brute Force – Inelegant, But Sometimes Useful