BAZOO

So much to do, so little time

Trying to squeeze sense out of chemical data

Frequency of a Term via PubMed

with 6 comments

A little while back, Egon posted a question on FriendFeed, asking whether there was an easy way, preferably a service, to determine and plot the usage count of a term in PubMed by year. This is simple enough using the Entrez Utilities CGI. A quick Python script to do this (with minimal error checking) is given below. It’d be relatively trivial to wrap this as a mod_python application and generate a bar plot directly (either using Python or using one of the online charting API’s)

1
2
3
4
5
6
7
8
9
10
11
12
13
import urllib
import xml.etree.ElementTree as ET

u = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=%s&mindate=%d/01/01&maxdate=%d/12/31"
term = "artemisinin resistance"
startYear = 1998
endYear = 2009
for year in range(startYear, endYear+1):
    url = u % (term.replace(" ", "+"), year, year)
    page = urllib.urlopen(url).read()
    doc = ET.XML(page)
    count = doc.find("Count").text
    print year, count

Update 1

A little more hacking and the above code was converted to a mod_python application, which can be accessed using a URL of the form http://rest.rguha.net/usage/usage.py?term=TERM&syear=1997&eyear=2009. With the help of the handy pygooglechart module, the above URL returns an <img> tag containing the appropriate Google Charts URL. As a an example, the term “artemisinin resistance” results in this image.

Update 2

Jan Schoones pointed out in a comment that my artemisinin resistance example was slightly incorrect, as the resultant PubMed search does not search for the exact phrase, but rather, looks for documents that contain the words “artemisinin” and “resistance”. This is because the example URL does not include the quotes around the phrase. A more correct example would be here, where we search for the phrase, rather than individual words.

Written by Rajarshi Guha

November 10th, 2009 at 11:50 pm

Posted in software

Tagged with , ,

6 Responses to 'Frequency of a Term via PubMed'

Subscribe to comments with RSS or TrackBack to 'Frequency of a Term via PubMed'.

  1. Very interesting! A tiny comment, if I may: I presume, in regards to your example, you have searched for artemisinin resistance without quotes. Such a search will retrieve 581 references (11-11-2009). This search is processed by PubMed as ((“artemisinine”[Substance Name] OR “artemisinine”[All Fields] OR “artemisinin”[All Fields]) AND resistance[All Fields])

    The use of quotes, “artemisinin resistance”, will result in only 20 references. The exact phrase “artemisinin resistance” will be found. These 20 references are well below the amount of references you presented in your example.

    Jan W. Schoones

    11 Nov 09 at 7:40 am

  2. Thanx!

    Can you have a look at these TERMS:

    TERM=”metabolomics OR metabonomics”

    that seems to pick up just the count of the first

    TERM=willighagen

    that gives a stack trace…

    Egon Willighagen

    11 Nov 09 at 8:05 am

  3. [...] Frequency of a Term via PubMed at So much to do, so little time blog.rguha.net/?p=443 – view page – cached Trying to squeeze sense out of chemical data [...]

  4. For the first case, make sure to write the term s metabolomics+or+metabonomics in the URL

    For the second case, it’s fixed

    Rajarshi Guha

    11 Nov 09 at 12:50 pm

  5. Jan, thanks for the comment. Yes, you’re right that without the quotes, the exact phrase is not searched for. This can be fixed by using the quoted form

    Rajarshi Guha

    11 Nov 09 at 12:58 pm

  6. Minor coding observation – the ‘term.replace(” “, “+”)’ can also be done with urllib.quote_plus, which would also handle some of the other special characters someone might put in.

    Andrew Dalke

    15 Nov 09 at 12:16 am

Leave a Reply