Notes & thoughts from the IU semantics workshop

Over the last two days I attended a workshop titled Exploiting Big Data Semantics for Translational Medicine, held at Indiana University, organized by David Wild, Ying Ding, Katy Borner and Eric Gifford. The stated goals were to explore advances in translation medicine via data and semantic technologies, with a view towards possible fundable ideas and […]

Visual pairwise comparison of distributions

While analysing some data from a dose respons screen, run across multiple cell lines, I need to visualize summarize curve data in a pairwise fashion. Specifically, I wanted to compaure area under the curve (AUC) values for the curve fits for the same compound between every pair of cell line. Given that an AUC needs […]

Path Fingerprints and Hash Quality

Recently, on an email thread I was involved in, Egon mentioned that the CDK hashed fingerprints were probably being penalized by the poor hashing provided by Java’s hashCode method. Essentially, he suspected that the collision rate was high and so that the many bits were being set multiple times by different paths and that a fraction of bits were not […]

Some More Comparisons with the GSK Dataset

My previous post did a quick comparison of the GSK anti-malarial screening dataset with a virtual library of Ugi products. That comparison was based on the PubChem fingerprints and indicated a broad degree of overlap. I was also interested in looking at the overlap in other feature spaces. The simplest way to do this is […]