Andrew Dalke recently published a detailed write up on his implementation of the Pubchem fingerprints and provided a pretty thorough comparison with the CDK implementation. He pointed out a number of bugs in the CDK version; but he also noted that performance could be improved by caching parsed SMARTS queries – which are used extensively […]
Lots of Pretty Pictures
Yesterday I attended the High Content Analysis conference in San Francisco. Over the last few months I’ve been increasingly involved in the analysis of high content screens, both for small molecules and siRNA. This conference gave me the opportunity to meet people working in the field as well as present some of our recent work […]
Wikipedia Category Hierarchy via N-triples
For a current project I needed to obtain a hierarchical representation of Wikipedia categories. (which can be explored here). Pierre Lindenbaum provided some useful pointers on using the Mediawiki API. However, this was a little unweildy. Instead, I came across the DBpedia downloads. More specifically, the SKOS categories files provide the links between categories using […]