BAZOO

So much to do, so little time

Trying to squeeze sense out of chemical data

Substructure Matching, REST style

without comments

I’ve been putting up a number of REST services for a variety of cheminformatics tasks. One that was missing was substructure searching. In many scenarios it’s useful to be able to check whether a target molecule contains a query substructure or not. This can now be done by visiting URL’s of the form

1
http://rguha.ath.cx/~rguha/cicc/rest/substruct/TARGET/QUERY

where TARGET and QUERY are SMILES and SMARTS (or SMILES) respectively (appropriately escaped). If the query pattern is found in the target molecule then the resultant page contains the string “true” otherwise it contains the string “false”. The service uses OpenBabel to perform the SMARTS matching.

Using this service, I updated the ONS data query page to allow one to filter results by SMARTS patterns. This generally only makes sense when no specific solute is selected. However, filtering all the entries in the spreadsheet (i.e., any solvent, any solute) can be slow, since each molecule is matched against the SMARTS pattern using a separate HTTP requests. This could be easily fixed using POST, but it’s a hack anyway since this type of thing should probably be done in the database (i.e., Google Spreadsheet).

Update

The substructure search service is now updated to accept POST requests. As a result, it is possible to send in multiple SMILES strings and match them against a pattern all at one go. See the repository for a description on how to use the POST method. (The GET method is still supported but you can only match a pattern against one target SMILES). As a result, querying the ONS data using SMARTS pattens is significantly faster.

Written by Rajarshi Guha

February 3rd, 2009 at 6:01 pm

Leave a Reply