Relaying data through web browsers

This post was originally published on 03/05/13. It has since been updated several times, largely to improve readability on small-screen devices.

Consider the following scatter plot, table and form elements that were created with the help of the d3 and jQuery JavaScript libraries:

Administrative Region x Value y Value

x axis:

y axis:

Auto scrolling:

Animation:

The data in the above display come from the 2011 census for England & Wales. It was published by the Office for National Statistics, and I retrieved it from the Guardian website. It might look familiar if you’ve read my demonstration article on Showing Marginal Distributions. That article was concerned with only the display of two variables and ignored the other 87 variables measured for each administrative region. Here, however, we are interested in all the variables which is what the “x axis” and “y axis” form elements are for changing: any of the 89 variables can be encoded along either axis, with the values given in the table on the right changing accordingly. That’s 7921 different possibilities (although I doubt anyone would find it useful plotting the same variable on each axis).

We can get more from this plot. Hovering over a data point brings up a tooltip telling you the location which that (top) data point corresponds to. Hovering over a location in the table highlights the relevant data point in the scatter plot (useful for finding out how your home town fared if nothing else). Moreover, clicking on either will highlight both in yellow (and bring the point to the front if it wasn’t already). This makes it easy to follow a point, or points, when the variables encoded along the axes are changed; especially useful for tracking outliers.

Reasons to be cheerful

All you should need for the interactive display to work correctly on your computer is an internet connection and a (free) modern web browser (not Internet Explorer 8 or earlier). It’s constructed using a mix of HTML, CSS and JavaScript – there is no requirement that you even have Flash or Java plugins installed. It should work on your Windows desktop at work and on your Macbook at home. Moreover, it should work on your colleague’s or customer’s Linux computer on the other side of the world too. Neither of you need to have shelled out a four-figure sum for a MATLAB or SAS licence (for example).

If, at this point, you’re wondering why you should be excited that your customers can interact with my data visualisation then I haven’t explained myself very well. The point I’m really trying to make is that modern web browsers combined with the use of freely available JavaScript libraries can make the process of relaying data – whether that be financial forecasts, scientific results or census statistics – more immersive, interactive and engaging. And more transparent.

Hope for the future

I don’t mean to sound like I’m positing something new or revolutionary. The New York Times in particular has been doing things like this for a number of years. But coming from a background in physics, I’d love to see more in the way of scientific results communicated in such a manner. Generally (I perceive), people publish papers highlighting the results they think are interesting and meaningful from their data. In the interests of brevity some data will be left out if it isn’t thought to be significant and graphics will be designed to highlight the results of interest and create a better narrative. But what if that omitted data would be significant to the conclusions of subsequent research by a different group? The ability to cross check, at least in part, other people’s results while writing my doctoral thesis would have been invaluable. Perhaps someone will look at my thesis one day and feel the same.

Feedback

The chart and accompaniments at the top of the page are something of a work in progress. I’d like to add some simple analytical tools so that the end-user can get more from the display. So I’m keen to know which tools you think would be useful. The data was chosen because it was handy and had many variables. The tools I have in mind at the minute are rather generic – calculate the linear correlation coefficient, sort the table based on any variable in the dataset etc – but ideas for tools specific to the dataset are welcome too.