This post was originally published on 03/05/13. It has since been updated several times, largely to improve readability on small-screen devices.
|Administrative Region||x Value||y Value|
The data in the above display come from the 2011 census for England & Wales. It was published by the Office for National Statistics, and I retrieved it from the Guardian website. It might look familiar if you’ve read my demonstration article on Showing Marginal Distributions. That article was concerned with only the display of two variables and ignored the other 87 variables measured for each administrative region. Here, however, we are interested in all the variables which is what the “x axis” and “y axis” form elements are for changing: any of the 89 variables can be encoded along either axis, with the values given in the table on the right changing accordingly. That’s 7921 different possibilities (although I doubt anyone would find it useful plotting the same variable on each axis).
We can get more from this plot. Hovering over a data point brings up a tooltip telling you the location which that (top) data point corresponds to. Hovering over a location in the table highlights the relevant data point in the scatter plot (useful for finding out how your home town fared if nothing else). Moreover, clicking on either will highlight both in yellow (and bring the point to the front if it wasn’t already). This makes it easy to follow a point, or points, when the variables encoded along the axes are changed; especially useful for tracking outliers.
Reasons to be cheerful
Hope for the future
I don’t mean to sound like I’m positing something new or revolutionary. The New York Times in particular has been doing things like this for a number of years. But coming from a background in physics, I’d love to see more in the way of scientific results communicated in such a manner. Generally (I perceive), people publish papers highlighting the results they think are interesting and meaningful from their data. In the interests of brevity some data will be left out if it isn’t thought to be significant and graphics will be designed to highlight the results of interest and create a better narrative. But what if that omitted data would be significant to the conclusions of subsequent research by a different group? The ability to cross check, at least in part, other people’s results while writing my doctoral thesis would have been invaluable. Perhaps someone will look at my thesis one day and feel the same.
The chart and accompaniments at the top of the page are something of a work in progress. I’d like to add some simple analytical tools so that the end-user can get more from the display. So I’m keen to know which tools you think would be useful. The data was chosen because it was handy and had many variables. The tools I have in mind at the minute are rather generic – calculate the linear correlation coefficient, sort the table based on any variable in the dataset etc – but ideas for tools specific to the dataset are welcome too.