So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘python’ tag

The ONS Challenge & Visualizing Chemical Space

with 5 comments

The ONSChallenge has been running for some time now and the simple web query form that tied in the data from Google Docs along with web services from IU has turned out to be pretty handy. With more and more data becoming available, I had done some initial exploratory analysis of the measured solubilities. One thing that is useful to the experimentalists is a suggestion of which compound to test next. This could be made on the basis of many factors – availability, ease of synthesis and so on. But one way to look at it is to examine what types of compounds have been tested previously, and suggest that the subsequent compounds be very different from those that have been tested.

Read the rest of this entry »

Written by Rajarshi Guha

December 30th, 2008 at 6:04 pm

Getting a CAS Number from a PubChem CID

with one comment

A few days back, Hari on FriendFeed had asked how one could get a a CAS number from a PubChem compound ID (CID). The reverse, that is finding a CID for a given CAS number is generally quite easy as shown by Rich here and here. Since I was trying to get some writing done, this was a good excuse for a quick hack to solve the problem.

Read the rest of this entry »

Written by Rajarshi Guha

December 12th, 2008 at 2:49 am

Posted in software

Tagged with , , ,

Multi-threaded Database Access with Python

with 8 comments

Pub3D contains about 17.3 million 3D structures for PubChem compounds, stored in a Postgres database. One of the things we wanted to do was 3D similarity searching and to achieve that we’ve been employing the Ballester and Graham-Richards method. In this post I’m going to talk about performance – how we went from a single monolithic database with long query times, to multiple databases and significantly fasterĀ  multi-threaded queries.

Read the rest of this entry »

Written by Rajarshi Guha

November 14th, 2008 at 4:46 pm