Version 1.4.3 of rpubchem is out on CRAN. There’s some minor code cleanups and also a new function called get.aid.by.cid which allows you to get assay ID’s based on whether they contain a compound (either as an active, inactive, discrepant or just tested). This uses PUG to perform the query, so can be a bit slow (and occasionally just fail).
Frequency of a Term via PubMed
A little while back, Egon posted a question on FriendFeed, asking whether there was an easy way, preferably a service, to determine and plot the usage count of a term in PubMed by year. This is simple enough using the Entrez Utilities CGI. A quick Python script to do this (with minimal error checking) is given below. It’d be relatively trivial to wrap this as a mod_python application and generate a bar plot directly (either using Python or using one of the online charting API’s)
1 2 3 4 5 6 7 8 9 10 11 12 13 | import urllib import xml.etree.ElementTree as ET u = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=%s&mindate=%d/01/01&maxdate=%d/12/31" term = "artemisinin resistance" startYear = 1998 endYear = 2009 for year in range(startYear, endYear+1): url = u % (term.replace(" ", "+"), year, year) page = urllib.urlopen(url).read() doc = ET.XML(page) count = doc.find("Count").text print year, count |
Update 1
A little more hacking and the above code was converted to a mod_python application, which can be accessed using a URL of the form http://rest.rguha.net/usage/usage.py?term=TERM&syear=1997&eyear=2009. With the help of the handy pygooglechart module, the above URL returns an <img> tag containing the appropriate Google Charts URL. As a an example, the term “artemisinin resistance” results in this image.
Update 2
Jan Schoones pointed out in a comment that my artemisinin resistance example was slightly incorrect, as the resultant PubMed search does not search for the exact phrase, but rather, looks for documents that contain the words “artemisinin” and “resistance”. This is because the example URL does not include the quotes around the phrase. A more correct example would be here, where we search for the phrase, rather than individual words.
Updated Versions of R Packages
New versions of several of my R packages are now available on CRAN. rcdk 2.9.6 goes along with rcdklibs 1.2.3. The latter now uses the most recent cdk-1.2.x branch from Github. The former fixes a number of bugs relating to descriptor calculations, saving molecules in SD format and setting/getting properties on molecules. Unfortunately, because the 1.2.x branch does not have robust depiction code, the visualization methods in rcdk are currently disabled. The fingerprint package has also been updated and now includes a number of unit tests.
Another Conference Done
The CHI RNAi conference is over and will now head back home. Being new to the field of RNAi screening, I’ve been looking for a place (virtual or real) where I can meet other people, especially those working in large scale screening facilities. Reading the literature is certainly useful, but face to face interactions are always richer. I was very pleased to see the meeting was of a high quality. While it wasn’t always cutting edge (most of the work had been published, but is still new to me) there were some very interesting talks ranging from the use of RNAi screens to probe myeloma biology, mTOR addiction and reconstruction of genetic networks to meta-analysis of multiple RNAi screens for the identification of synthetic lethal targets, parallel chemical and RNAi screens and the use of complex phenotypes and their analysis. Of course, a lot of it went over my head – but that was to be expected I was also pleasantly surprised to see very few vendor talks – the bulk of the talks were from academics or staff of core facilities..I also got to meet a number of people involved in RNAi screening facilities and had some very enlightening discussions. A lot of things to implement and test when I get back home! Overall a very useful meeting and I hope to make it again next year.
Now, just need to get home and schedule the ACS CINF program for the Spring meeting.
A New Toolkit on the Block
A few days ago I was pointed to a new cheminformatics toolkit called Indigo, written in C++ with source code available under the GPLv3 license. Rich has previously commented on this. While it’s an initial release, it has a number of interesting components such as an Oracle cartridge, 2D depiction and scaffold detection and R-group generation. Unfortunately I wasn’t able to run the OS X binaries – I’ll give it a try on Linux. The future plans for the library also seem quite interesting including wrappers for other languages, chemical OCR etc. It’s also nice to see that they have a page describing the algorithms they implemented – some of them well known ones such as the Kabsch alignment method and others (fingerprints, depictions) that appear to be unique to the toolkit. Unfortunately, most of the descriptions are in Russian, so we’ll have to wait for English translations.