BAZOO

# So much to do, so little time

Trying to squeeze sense out of chemical data

## Update to the REST Descriptor Services

The current version of the REST interface to the CDK descriptors allowed one to access descriptor values for a SMILES string by simply appending it to an URL, resulting in something like

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/
org.openscience.cdk.qsar.descriptors.molecular.ALOGPDescriptor/c1ccccc1COCC

This type of URL is pretty handy to construct by hand. However, as Pat Walters pointed out in the comments to that post, SMILES containing ‘#’ will cause problems since that character is a URL fragment identifier. Furthermore, the presence of a ‘/’ in a SMILES string necessitates some processing in the service to recognize it as part of the SMILES, rather than a URL path separator. While the service could handle these (at the expense of messy code) it turned out that there were subtle bugs.

Based on Pats’ suggestion I converted the service to use base64 encoded SMILES, which let me simplify the code and remove the bugs. As a result, one cannot append the SMILES directly to the URL’s. Instead the above URL would be rewritten in the form

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/
org.openscience.cdk.qsar.descriptors.molecular.ALOGPDescriptor/YzFjY2NjYzFDT0ND

All the example URL’s described in my previous post that involve SMILES strings, should be rewritten using base64 encoded SMILES. So to get a document listing all descriptors for “c1ccccc1COCC” one would write

http://rguha.ath.cx/~rguha/cicc/rest/desc/descriptors/YzFjY2NjYzFDT0ND

While this makes it a little harder to directly write out these URL’s by hand, I expect that most uses of this service would be programmatic – in which case getting base64 encoded SMILES is trivial.

Written by Rajarshi Guha

January 11th, 2009 at 5:52 pm