A short(ish) introduction to the binomial distribution

Averages make for nice headline figures but are generally of limited use. Probability distributions can tell us far more. For example, if we think a coin is fair and we toss it 30 times and get only 12 tails should we start to question our assumption that the coin is unbiased? What are the chances of that happening? Knowing only that we should expect to see about 15 tails doesn’t tell us very much. Knowing the complete distribution of probabilities (if the coin were known to be fair), on the other hand, will help us to answer that question. In this case – and for many similar problems – we can use the binomial distribution.

The binomial distribution is (I think) one of the more intuitive distributions to get one’s head around. It’s also one of the most important for discrete data. The two parameters needed to describe an instance of the distribution are simple: the number of independent trials (n) and the probability of “success” (p) in each trial (which must be constant from trial to trial). A trial can be tossing a coin and recording whether it lands on tails, throwing a dice and recording whether it is a 1, or any number of other “experiments” with a binary outcome (i.e. “success” or “failure”). (The “success” terminology can be a little confusing since on occasion a success can represent something undesirable happening.)

We can get to the final formula for a binomially distributed discrete random variable (such as the number of tosses out of 30 that come up tails) using some basics of probability. Since the probability of success in any given trial is p and all trials are independent, the probability that the first r trials will lead to r successes is

$p^{r}.$

If we continue to perform trials thereafter until we have completed n trials in total we have to do another nr trials. The probability that any given one of these subsequent trials is a failure is 1-p so the probability they are all failures is

$(1-p)^{n-r}.$

Again, due to independence, the probability of both these things occurring – r successes followed by nr failures – is the product of the two:

$p^{r}(1-p)^{n-r}.$

But this is just one way of getting r successes from n events. nr failures followed by r successes is just as (un)likely. As is any other permutation of r successes and nr failures. So to find the total probability for getting r successes from n trials we need to add up the number of possible ways of getting r successes and nr failures from n trials and multiply by the above probability. The number of ways is given by the binomial coefficient

$\frac{n!}{r!(n-r)!},$

where “!” represents the factorial of the preceding number. If you want to know exactly where this came from then this article explains in detail. The complete formula, the probability that the number of successes X is exactly r, is then

$\textup{Pr}(X=r) = \frac{n!}{r!(n-r)!}p^{r}(1-p)^{n-r}$

for all integer values of r between 0 and n (and 0 otherwise).

The mean, μ, of the binomial distribution is simply

$\mu =np$

while the standard deviation, σ, is given by

$\sigma =\sqrt{np(1-p)}.$

The former may be obvious, the latter probably less so. Detailed derivations can be found here.

These calculations can be tedious and if you’re not careful you can end up with intermediate steps that your calculator cannot cope with thanks to the factorials giving rise to very large numbers. The interactive chart below can hopefully save you some effort (as long as you limit yourself to a thousand trials or less and p isn’t much less than 1 in 100). Drag the sliders to adjust n and p or click/tap on the associated arrows to change these values incrementally. You can also change the value of interest (for the table below the chart) using the slider (or associated buttons) or by clicking inside the rectangular chart region. The blue bars display the binomial distribution for the given parameters while the yellow vertical line marks the value of interest. The red curve will be discussed shortly. The middle column of the table provides details about the binomial distribution and the value of interest.

Value of interest (r): ←→

Measure Binomial Poisson
Mean (μ)
Std. dev. (σ)
Pr(X = )
Pr(X < )
Pr(X)
Pr(X > )
Pr(X)

Often the trickier part of using the binomial distribution (and distributions in general) is making sure to ask the right question of it. Going back to the coin question – should we worry the coin is biased if we only observe 12 tails from 30 throws? – we might naively look for the probability of getting 12 successes from 30 trials with a fair coin. Using the chart and table we find this has an 8% probability. That’s small but clearly non-negligible. We’d be more suspicious if there were even fewer tails. 12 or fewer tails from 30 tosses is an 18% chance. Wouldn’t we be just as suspicious if we’d got 18 tails and only 12 heads? So there’s a 36% chance that the number of heads differs from the number of tails by at least 6. That’s more than 1 in 3. But we wouldn’t know that if all we looked at was the average.

As already noted, If you’re using a calculator to get values of the binomial distribution you may have a problem due to the inclusion of large factorials. You may be able to work around this using some clever maths. Alternatively, you may be able to use an approximation. One such approximation is the Poisson distribution.

The Poisson distribution concerns the probability of a certain number of independent events happening in a fixed interval when the average rate at which they happen is a known constant. With an average rate per interval of λ, the probability distribution is given by:

$\textup{Pr}(X=r) = \frac{\lambda^{r}e^{-\lambda }}{r!}$

for all integer values of r greater than or equal to 0 (and 0 otherwise). The mean and standard deviation of this distribution are λ and λ½, respectively.

The Poisson distribution deserves a whole article to itself but I’ll try to keep it short here. (You can find out more here and here, for example.) If n is large and p is small then substituting np for λ in the equation above will create a distribution that is a “good” approximation to the binomial distribution. How good an approximation depends on how large and how small n and p are, respectively. Because p is less than 1 (for all non-trivial cases), the factorial in the denominator is smaller than those seen in the binomial coefficient. This may ease use of a calculator.

I’ve added the corresponding Poisson approximation to the chart above (red line) and a third column to the table showing parameters calculated using the approximation. For the case of a coin toss, p is large and therefore the Poisson approximation is poor. On the other hand, if n = 400 and p = 0.04, the approximation is very good. One should still be aware that the Poisson approximation will give a finite probability that the number of successes exceeds the number of trials!

With the interactive tool above the binomial distribution is calculable for all values of n up to 1,000 so there is no particular reason to use the Poisson approximation. However, it does graphically illustrate when the latter may be appropriate next time you get stuck with only a calculator in your hand.