Getting the GO into a Graph Data Structure

Today while working on a project I needed to get access to the Gene Ontology hierarchy. While there a number of GO browsers such as Amigo, I needed access to the raw data to generate a graph that I could then slice and dice. A few minutes with Python led to a simple solution.

The program parses the OBO 1.2 formatted GO data file (either by directly downloading it or from a local file) and outputs a flat dictionary listing the term ID’s, names, namespace etc and a network representation of the GO hierarchy in ncol format. It uses a simpleĀ  (and relatively non-robust) class to represent the data as an undirected graph (not really correct), though it’d be easy to use something like igraph to start doing some real network analysis. It’s certainly not a comprehensive solution, but I thought I’d put it out there.

3 thoughts on “Getting the GO into a Graph Data Structure

  1. Gabor says:

    Nice. Here is a version in GNU R (http://www.r-project.org), we need to install some BioConductor (http://www.bioconductor.org) packages first.

    source(“http://bioconductor.org/biocLite.R”)
    biocLite(“GO.db”)
    library(GO.db)
    library(igraph)
    BP <- toTable(GOBPPARENTS)
    CC <- toTable(GOCCPARENTS)
    MF <- toTable(GOMFPARENTS)
    g <- graph.data.frame( rbind(BP,CC,MF) )

    This does everything, from downloading the data, to creating the ‘igraph’ object.

  2. Aah, thanks for the example. If only i had known (since I end up doing the final analysis in R) :)

  3. […] Guha wrote a nice Python script that convert the Gene Ontology graph into an igraph graph. Here is an R version that is much […]

Leave a Reply

Your email address will not be published. Required fields are marked *