I first learnt how to make and read pie charts when I was in junior school and then used them once or twice in geography projects in secondary school. After leaving school I didn’t really have any need for them. But it recently came to my attention that they are very popular with some and loathed by others.
As Stephen Few explains at length in his article Save the Pies for Dessert, scientific studies show there are very few instances in which information could not be conveyed better using either a table or bar chart. A primary reason for preferring a bar chart to a pie chart is that we are better at judging lengths (as we do when decoding bar charts) than we are at judging angles, areas or arc lengths (options for decoding a pie chart). As Edward Tufte explains in The Visual Display of Quantitative Information, things get worse for the pie chart when more than one is required; “for then the viewer is asked to compare quantities located in spatial disarray both within and between pies”.
Take the following example – showing carbon dioxide emissions from different economic regions of the world in two years – from the 2012 edition of “Key World Energy Statistics”, a document published each year by the International Energy Agency:
In the introductory section of this edition of KWES it is explained that the document contains “timely, clearly-presented data on the supply, transformation and consumption of all major energy sources”. However the use of pairs of pie charts in the example above (and on many other pages besides) seems to contradict the principle of “clearly-presented data”. The fact that the pie charts are rendered as three-dimensional only makes matters worse since it hinders judgments of area, arc length and angle covered by the pie sectors. The only decoding that will give reliable answers is the verbal decoding of the text labels. This means the charts display no more reliable information than an ordinary table and, because the arrangement is not linear, it is likely to be slower to scan for the desired numbers.
The examples that follow are alternatives designed by me. It was assumed, given the original use of pie charts, that the primary goal was to communicate proportions and the part-to-whole relationships rather than absolute values.
Alternatives: bar charts
The obvious alternative, as hinted at, is a simple pair of bar charts:
As described above, the bar chart format allows for accurate decoding of values by encoding the data with bar length rather than area/angle/arc length. However, it also offer more:
- Comparison between the proportions for the two years for a given region is straightforward as the bars lie in a straight line rather than as (generally) mis-aligned sectors of two separated circles. There is no “spatial disarray”;
- The data can be ordered by ascending or descending values for one or other of the years allowing for a clear visual hierarchy which is not present in a pie chart unless there are only a few, disparate, values;
- The separation between data labels and data elements that encode them does not depend on the magnitude of the data element. In the 1973 pie chart of the original figure the labels for the small sectors of Africa, Non-OECD Americas, Asia and China are cramped together;
- As a bonus, it easy to include a second set of axes for illustrating absolute values.
Some might feel that this bar chart lacks one key feature of a pie chart: a direct visual indication of what part of the whole a value represents. One could extend the percentage axes to 100% which may satisfy some dissenters. A more explicit solution is to add frames around the bars (this idea was inspired by the “framed rectangles” suggested as an alternative to the choropleth map by Cleveland and McGill):
The framed rectangles in the figure above facilitate a direct comparison for each value between it and the whole. By making the bars more saturated in hue (than the simple bar chart example) and the frames light and subtle, the sense that these are still bar charts and not a collection of individual boxes can be maintained. The small (I think) downside of this design is that more of the space of the figure is white space and less is bar. This may be more of a problem in situations in which the whole is made up of many small categories, but it is in these situations where comprehension of a pie chart is also likely to get particularly difficult.
The addition of frames seems like a really simple enhancement but I’m unaware of any software that implements this directly (the above was conjured up using low-level graphics programming in R). Perhaps I’m missing an obvious problem with these?
In some instances (though not above), pairs of related pie charts are scaled so that the area of each pie chart is proportional to the sum of its parts. Unfortunately, since we’re fairly poor at judging areas this is largely a qualitative rather than a quantitative enhancement. With bars, however we can make accurate length judgments. Rescaling the framed rectangle figure so that equal horizontal distance represents equal absolute carbon dioxide emissions between the sets of bars gives us the following:
This redesign maintains the easy perceptual comparison between different parts of each whole but emphasises the fact that total emissions have gone up – by comparing the lengths of the frames themselves there is a direct indication of the ~2 times increase in total annual emissions. It also easier to see, for example, that total emissions increased for OECD countries even though the percentage for which they were responsible went down.
The most direct way for comparing absolute values between the two years is to overlay them:
With this design, however, it seems that we have moved away from presenting the data in terms of part-to-whole relationships and moved on to presenting the data for comparison between years. The use of two different percentage scales (when really we’re comparing absolute values) on the axes might also be confusing.
A completely different alternative to a pair of pie charts is the slopegraph, popularised (invented?) by Edward Tufte. This method plots a list of nouns (the different economic regions in this case) in each of two vertical columns – one for each year – based on the value (percentage) associated with that noun for that year, with lines linking the two instances of the noun. It’s probably easier to understand this by example:
In principle the slopegraph should work nicely for this kind of data. Parts can be compared to other parts for the same year by comparing height up the column and the relative changes of a single part can be seen through the magnitude of the slope connecting its two instances. The problem with this dataset is that many of the values lie in close proximity at the bottom of the chart (especially for the 1973 column). This creates a rather squashed look and labels have to be shifted so that they can be readable. It is, therefore, the heights of the line ends and not of the labels that accurately encode the data but it is the latter that are more prominent. The scales could be extended up to 100% to give a better sense of the part-to-whole relationships for the regions but this would involve squashing the data at the bottom even more (or making the figure impractically tall).
So which chart is the most effective? I’m quite fond of my second framed rectangles design but I haven’t done any kind of scientific test of its effectiveness. I like that it directly shows how each value compares to the sum of all the values for that year and that it allows for easy comparison of absolute values between years. But is it easy to understand without a detailed explanation? Is there a perceptual problem with the design that I’ve missed? You tell me.