The development and promotion of potential screening tests for Alzheimer’s disease has been all-over the news in 2014. David Colquhoun, who knows a lot more about this than I do, has been very critical of much of the coverage.
Screening tests are typically described in terms of their “sensitivity” and their “specificity”. Sensitivity refers to the proportion of patients that have the condition being tested for that the test correctly identifies as having the condition. Specificity, meanwhile, refers to the proportion of patients that don’t have the condition being tested for that the test correctly identifies as not having the condition. Values of 80% or higher, say, may sound impressive but are largely meaningless without reference to the prevalence of the condition – the proportion of the population that have the condition. Here “population” may refer literally to the entire population if anyone might be tested or to some relevant subset (e.g. women or men over 65).
The problem is that when a condition being tested for is relatively rare – the prevalence is low – the number of false positives can greatly outweigh the true positives unless the specificity is very high. For serious conditions like Alzheimer’s or cancer this can lead to a large amount of unnecessary worry and possibly unwarranted operations.
To many (most I imagine) this is probably not obvious. But it actually represents a classic problem in the teaching of Bayesian statistics and a classic problem for highlighting the base rate fallacy. And it isn’t (or at least wasn’t) just a problem for those unfamiliar with the field – medical professionals have struggled too. Rather than try and convince you of the issue – aside from the Colquhoun article there are nice explainers here, here and here – I thought I’d build a little interactive tool so that people could find the answers for themselves next time a ‘ground-breaking’ screening test gets hyped-up in the media or by university press release. It can also be used to experiment with different scenarios or to get some idea of how changing each variable one at a time changes the overall results.
The tool (below) is just a set of sliders and a bar chart drawn based on the slider settings. The sliders come in pairs, one for each of the three variables – prevalence, sensitivity and specificity. The upper of each pair of sliders is a coarse control, so that you can quickly move the prevalence from 0 to 99% or the sensitivity or specificity from 50% to 99%. The lower slider in each pair allows finer control. These are mostly useful at extremes. For example one can specify a prevalence down to 0.01%. There is then a huge difference between a specificity of 99% and 99.99% (though it’s questionable whether even the latter test would be useful if it were ever feasible). Clicking on the image toggles between two complimentary forms of horizontal-axis labelling.
I find the sliders work better with a mouse than a touchscreen. There’s a certain knack for getting them to work on the latter. Let me know if you have any other issues or comments.
Within the last couple of days David Colquhoun has added a paper to the arXiv preprint repository which covers much of the material in the blog post mentioned at the top of the article and a follow-up post.
In addition to the above tool, I have also created an interactive tree diagram covering the same issue. This can be found here.