## Introduction

I recently wrote a couple of blog posts (part 1, part 2) about the potential cost of collecting Panini’s 2014 World Cup stickers. Subsequently these stickers have been back in the news after former Manchester City striker Mario Balotelli filled the Italy page of the Panini album with stickers of himself. Naturally a question arose in my head (and not just mine) about how much that would cost assuming Mario collected them all in the conventional manner (and without swapping).

## Simulation

It’s not entirely clear just how many stickers of himself Mario collected. The Guardian suggests 14 but there are 19 Italy players in the album. In-line with James Offer I’ll go with the latter. (The end results for the former can be found here.)

Just one small change was needed to the code from my original Panini stickers post to be able to simulate this most serious of problems. (For those interested, this basically amounted to changing the condition in the while loop to be TRUE so long as less than 19 of Mario’s sticker number had been collected.) This change took considerably less time than actually running the 10,000 trials of the simulation (the latter taking about quarter of an hour). As previously, a histogram neatly summarises these results:

From the simulations, the mean cost of collecting nineteen Mario Balotelli stickers is £1,218 (assuming the stickers were being bought in the UK). As always, the mean doesn’t tell the whole story. A handful of the 10,000 trials came in at under £500 while over 60 cost in excess of £2,000. More notably, the standard deviation of the data is £277.

## Calculation

After running the simulation I realised that this sticker collection problem is actually mathematically much simpler than the problem of collecting at least one of every sticker. If we ignore the fact that the stickers come in packs of 5 at 50p but instead pretend they come in packs of 1 at 10p then the probability distribution of the number of cards required to reach our goal should follow a negative binomial distribution. The total cost then follows the same distribution but with everything shifted down by a factor of ten (since you get ten cards to the pound).

According to the negative binomial distribution, finding the mean number of stickers needed to reach our goal is easy: it’s 19 times the mean number needed for one sticker. The mean number of stickers needed to collect one Mario Balotelli is, in turn, equal to the total number of different stickers, 640. That gives 12,160 stickers at a total cost of £1,216, in excellent agreement (as one would hope) with the £1,218 from the simulations.

The standard deviation is a little less intuitive but straightforward to calculate and gives a result of 2788 stickers, corresponding to £279. Again this is in very nice agreement with the standard deviation in cost from the simulation (£277).

In fact we can use the probability distribution function of the negative binomial distribution to do a broader visual comparison between simulation and mathematical expectation:

## How many trials is enough?

Clearly the agreement between simulation and mathematical expectation is very good. But you may think 10,000 trials may seem excessive and you’d probably be right. But it’s interesting to see how good the simulation is with fewer trials. We can do this with relevant numbers like mean and standard deviation and/or we can do this visually. The animated gif below combines the two, showing the results of the first ten trials, the first twenty trials and so on. Clearly the mean and standard deviation are liable to change more quickly when there are only a few trials so the step-size between images is increased systematically, shifting from 10 to 100 after 100 trials and from 100 to 1,000 after 1,000 trials. Click the image to (re)start the animation:

With only a couple of hundred trials the simulations reproduce the expected mean and standard deviation well. Qualitatively, to get a close match between the two distributions requires a further order of magnitude increase in the number of trials.