Words, Sentences, Fragments & Molecules

For some time I have been thinking of the analogy between linguistics (and text mining of language data) and chemistry, specifically from the point of view of fragments (though, the relationship between the two fields is actually quite long and deep, since many techniques from IR have been employed in cheminformatics). For example, atoms and […]

“Type-ahead” substructure searches

The other day I was exchanging emails with John Van Drie regarding open challenges in cheminformatics (which I’ll say more about later). One of his comments concerned the slow speed of chemical searches Google searches are screamingly fast, so fast that the type-ahead feature is doing the search as you key characters in.  Why are all chemical […]

Substructure Searches – High Speed, Large Scale

My NCTT colleague, Trung Nguyen, recently announced a prototype chemical substructure search system based on fingerprint pre-screening and an efficient in-memory indexing scheme. I won’t go into the detail of the underlying pre-screen and indexing methodology (though the sources are available here). He’s provided a web interface allowing one to draw in substructure queries or […]