New version of fingerprint (3.4.9) – faster Dice similarity matrices

I’ve just pushed a new version of the fingerprint package that contains an update provided by Abhik Seal that significantly speeds up calculation of pairwise similarity matrices when using the Dice similarity method. A ran a simple comparison using different numbers of random fingerprints (1024 bits, with 512 bits set to one, randomly) and measured […]

“Type-ahead” substructure searches

The other day I was exchanging emails with John Van Drie regarding open challenges in cheminformatics (which I’ll say more about later). One of his comments concerned the slow speed of chemical searches Google searches are screamingly fast, so fast that the type-ahead feature is doing the search as you key characters in. Why are all chemical […]

Substructure Searches – High Speed, Large Scale

My NCTT colleague, Trung Nguyen, recently announced a prototype chemical substructure search system based on fingerprint pre-screening and an efficient in-memory indexing scheme. I won’t go into the detail of the underlying pre-screen and indexing methodology (though the sources are available here). He’s provided a web interface allowing one to draw in substructure queries or […]

New Version of fingerprint

I’ve submitted version 3.4.3 of the fingerprint package to CRAN, so it should be available in a day or two. It’s an R package that lets you read in (chemical structure) fingerprint data from a variety of sources (CDK, MOE, BCI etc) and perform a variety of operations (bitwise, similarity, etc.) and visualizations on them. The […]

Caching SMARTS Queries

Andrew Dalke recently published a detailed write up on his implementation of the Pubchem fingerprints and provided a pretty thorough comparison with the CDK implementation. He pointed out a number of bugs in the CDK version; but he also noted that performance could be improved by caching parsed SMARTS queries – which are used extensively […]

« Previous
1
2
3
4
…
6
Next »

So much to do, so little time

Trying to squeeze sense out of chemical data

New version of fingerprint (3.4.9) – faster Dice similarity matrices

“Type-ahead” substructure searches

Substructure Searches – High Speed, Large Scale

New Version of fingerprint

Caching SMARTS Queries