When exploring a dataset on population compiled by the UN I ended up producing a number of small multiples charts. Each of the individual graphics had identical horizontal scales (the year, running from 1950 to 2100). The vertical scales (population) all began at zero but had varying maxima depending on the maximum value for the population (estimated or projected) of the country in question. This wasn’t, initially, an explicit design decision. It came about because the script used to construct the individual charts did so as a serial process and then stitched the files together. The question I had to consider, therefore, was whether this is/was suitable for small multiples?
Below is one eample, showing the past estimates and future projections for population for the world’s twelve largest countries (in terms of population) as of 2010 (source: UN).
In this case, with independent scaling of the vertical axes, we can easily identify comparable trends between a number of the countries even when the countries differ greatly in initial population. The similarity between neighbouring India and Bangladesh might not be particularly surprising, but the fact that the Latin American countries in the same column – Brazil and Mexico – also show similar trends is perhaps more interesting.
By contrast, the curves for China and Japan look very different from each other and from all the other countries. In the case of the former I presume this is closely related to the one-child policy, in force since 1979. As far as I’m aware (but I’m not an expert) in Japan, where population growth has already stopped, the drop-off in fertility rates is more a matter of choice, but the flattening (and future decline) of the population curves are exacerbated by an ageing population.
Where this form of scaling of the vertical axis falls short is in trying to compare absolute values between countries. This has to be done verbally – by actually mentally storing values read from the charts – rather than direct visual comparison. For example, comparing the curves of China and India alone does not tell the reader which is projected to have the greater population in 2100. In fact, the scales used are quite different and India’s population could exceed China’s by as much as half a billion by 2100, despite trailing at the present time.
A single scale
The obvious alternative to the above is to use a single scale for all vertical axes as shown below.
From a quick browse of the books sitting on my bookshelf, this would seem to be the usual approach. It certainly makes a comparison of the past and future populations of China and India more visual. It also makes clear something I didn’t touch upon previously: Nigeria is likely to be the third most populous country by the end of the twenty-first century, at the current time it is only the seventh-most populous.
Unfortunately most of the other detail is lost because the curves in most cases are flattened to such an extent.
A scale for each row
Another possibility is to use a common vertical scale for each row of the small multiple chart. In principle this allows the reader to compare absolute values easily for charts in a single row whilst keeping the curve structures for the less populous countries visible.
This version does allow for the direct comparison of China and India as in the previous example, while the curve structures for Indonesia, Brazil, Pakistan, Nigeria, Japan, Mexico and Philippines are still well-defined.
As is, this version still has perceptual problems. The division in to rows of three is done for reasons of practicality; in terms of the data it’s arbitrary. It makes the comparison of curves between rows awkward.
In addition, for each row the scale is “normalised” to the maximum population value in the row. Since the future population of Nigeria is projected to exceed those in Indonesia, Brazil and Pakistan, the third row’s scale exceeds that of the second row which may be confusing.
A logarithmic scale
Of course, a common way to show data that varies over a large range is to replace a linear scale with a logarithmic one. But is this approach appropriate for a small multiples chart?
The answer to the question above is, perhaps, that it depends what you’re looking for. The level of detail seen in the first figure is not present but the trends are more obvious for the countries with the smaller population than in the second example. The scaling is consistent between rows but, for some, will not be intuitive. It’s possible to compare the absolute values for China and India directly but the comparison will not be all that accurate – the fact that the population of India could exceed that of China by as much as half a billion by 2100 is not at all obvious.
Each form of vertical scaling has it’s upsides and downsides. For this dataset, with no specific goal in mind other than exploration, I find the independent scaling of the original example to be the most revealing. However, the ‘views’ of the data created by the other scaling methods do make more obvious certain aspects, such as the projected swapping in ranking of China and India and the massive increase in population expected in Nigeria over the remainder of the century.
Have I neglected a better alternative?