Archive for the ‘visualization’ Category
In a previous post, I described a simple web form to query and visualize the solubility data being generated as part of the ONS Challenge. The previous approach required me to manually download the data and load it into a Postgres database. While trivial from a coding point of view, it’s a pain since I have to keep my local DB in sync with the Google Docs spreadsheet.
This is very nice since I now no longer have to maintain a local DB and ensure that it’s in sync with Jean-Claudes results. Of course, there are some drawbacks to this method. First, the query page will assume that the data in the spreadsheet is clean. So if there are two entries called “Ethanol” and “ethanol”, they will be considered seperate solvents. Secondly, this approach cannot be used to include cheminformatics in the queries, since Google doesn’t support that functionality. Finally, it’s not going to be very good for large spreadsheets.
However, this is a very nice API, that allows one to elegantly integrate web applications with live data. I heart Google!
There was a FriendFeed dicussion on the use of RDF triples for representing the solubilty data generated by Jean-Claude and others as part of the ONS Solubility Challenge. Part of the discussion revolved around letting RDF novices easily perform queries of the data being collected. Not knowing much about RDF, I took the raw data from the Google Docs and loaded it into a Postgres database and whipped up a simple query form.
The DB and form are nothing remarkable. But what is cool is that the Google Visualization API makes it really easy for me include charts and other visualizations very easily. For example, if you select “any” as the solvent and then select a solute, the form creates a table of solubilities of that solute in all the solvents it was measured in. A natural view of the data is to look at a bar chart of the solubilities across the various solvents.
var data = new google.visualization.DataTable();
data.addColumn(’number’, ‘Conc (M)’);
data.setValue(0, 0, ‘thf’);
data.setValue(0, 1, 1.23);
data.setValue(1, 0, ‘acetonitrile’);
data.setValue(1, 1, 2.34);
Once you have the data all stored, some more boilerplate code allows us to easily insert the chart into the final web page. Very neat!
I came across an interesting site called the World Names Profiler, which given a surname colors a map of the world based on frequency of occurence of the name in different countries. They have a dataset of 300 million names across 26 countries.
While it’s a nice visualization, it was very interesting for me to see the spread of Indian surnames, as the Indian diaspora is spread out all over the globe. Obviously Indian surnames have a maximum frequency in India, but it’s quite interesting to note that Guha has a high frequency in North America and Central Europe and a very low frequency in Australia. I was also surprised to see that it had a non-zero occurrence in Argentina. On the other hand, Ghosh, is has a higher frequency in Canada compared to the US and a higher frequency in Argentina than Guha. However, Patel, has a much higher frequency in Australia than either Guha or Ghosh. Singh on the other hand, appears to have similar frequencies in Canada and Australia, which are both higher than in the US
I chose these surname because they’re pretty common Indian surnames. One could correlate frequencies of occurrence to the background represented by the surnames, but that would be easily confounded by stereotypes. However, for me, it’s a nice visualization of how Indians have spread over the globe.