The other day I was exchanging emails with John Van Drie regarding open challenges in cheminformatics (which I’ll say more about later). One of his comments concerned the slow speed of chemical searches
Google searches are screamingly fast, so fast that the type-ahead feature is doing the search as you key characters in. Why are all chemical searches so sloooow? … Ideally, as you sketch your mol in, the searches should be happening at the same pace, like the typeahead feature.
Now, he doesn’t specifically mention what type of chemical search – it could be exact matches, similarity searches, substructure or pharmacophore searches. The first two can be done very quickly and lend themselves easily to type ahead type search interfaces. In light of the work my colleague has been doing, the substructure searches are now also amenable to a type ahead interface.
So I quickly put together a simple web page that lets you type in a SMILES (or SMARTS) and as you type it retrieves the results of a substructure search via the NCTT Search Server REST API. (In some cases the depiction is broken – that’s a bug on my side). Of course, typing in SMILES is not the most intuitive of interfaces. Since Trung employs the ChemDoodle sketcher, an ideal interface would respond to drawing events (say drawing a bond or adding atoms etc) and pull up matches on the fly. Another obvious extension is to rank (or filter) the results – all the while, maintaining the near real time speed of the application.
As I said before, seriously fast substructure searches. It also helps that I can build these examples via a public REST API. I’m sure there are reasons for SOAP, XML and so on. But it’s 2011. So lets help make extensions and mashups easier.
UPDATE: Yes, it’s easy to create patterns (especially with SMARTS) that DoS the server. We have some filters for excessively generic patterns; so some queries may not behave in the expected manner