A little while back, Egon posted a question on FriendFeed, asking whether there was an easy way, preferably a service, to determine and plot the usage count of a term in PubMed by year. This is simple enough using the Entrez Utilities CGI. A quick Python script to do this (with minimal error checking) is given below. It’d be relatively trivial to wrap this as a mod_python application and generate a bar plot directly (either using Python or using one of the online charting API’s)
import xml.etree.ElementTree as ET
u = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=%s&mindate=%d/01/01&maxdate=%d/12/31"
term = "artemisinin resistance"
startYear = 1998
endYear = 2009
for year in range(startYear, endYear+1):
url = u % (term.replace(" ", "+"), year, year)
page = urllib.urlopen(url).read()
doc = ET.XML(page)
count = doc.find("Count").text
print year, count
A little more hacking and the above code was converted to a mod_python application, which can be accessed using a URL of the form http://rest.rguha.net/usage/usage.py?term=TERM&syear=1997&eyear=2009. With the help of the handy pygooglechart module, the above URL returns an <img> tag containing the appropriate Google Charts URL. As a an example, the term “artemisinin resistance” results in this image.
Jan Schoones pointed out in a comment that my artemisinin resistance example was slightly incorrect, as the resultant PubMed search does not search for the exact phrase, but rather, looks for documents that contain the words “artemisinin” and “resistance”. This is because the example URL does not include the quotes around the phrase. A more correct example would be here, where we search for the phrase, rather than individual words.