So much to do, so little time

Trying to squeeze sense out of chemical data

Getting the GO into a Graph Data Structure

with 3 comments

Today while working on a project I needed to get access to the Gene Ontology hierarchy. While there a number of GO browsers such as Amigo, I needed access to the raw data to generate a graph that I could then slice and dice. A few minutes with Python led to a simple solution.

The program parses the OBO 1.2 formatted GO data file (either by directly downloading it or from a local file) and outputs a flat dictionary listing the term ID’s, names, namespace etc and a network representation of the GO hierarchy in ncol format. It uses a simpleĀ  (and relatively non-robust) class to represent the data as an undirected graph (not really correct), though it’d be easy to use something like igraph to start doing some real network analysis. It’s certainly not a comprehensive solution, but I thought I’d put it out there.

Written by Rajarshi Guha

January 31st, 2009 at 1:34 am

Posted in software

Tagged with , ,

3 Responses to 'Getting the GO into a Graph Data Structure'

Subscribe to comments with RSS or TrackBack to 'Getting the GO into a Graph Data Structure'.

  1. Nice. Here is a version in GNU R (, we need to install some BioConductor ( packages first.

    BP <- toTable(GOBPPARENTS)
    CC <- toTable(GOCCPARENTS)
    MF <- toTable(GOMFPARENTS)
    g <- rbind(BP,CC,MF) )

    This does everything, from downloading the data, to creating the ‘igraph’ object.


    31 Jan 09 at 9:58 pm

  2. Aah, thanks for the example. If only i had known (since I end up doing the final analysis in R) :)

    Rajarshi Guha

    31 Jan 09 at 10:03 pm

  3. […] Guha wrote a nice Python script that convert the Gene Ontology graph into an igraph graph. Here is an R version that is much […]

Leave a Reply