Stick with me on this one

Introduction

I was a Panini Merlin English Premiership sticker collector back in the mid-nineties. Though I never managed to fill an album, I did enjoy picking up some of the greats of the era: Eric Cantona, Ryan Giggs, Alan Shearer and… um… Terry Hurlock.

Anyway, collecting football stickers is back in fashion. To be honest I’m not sure if it ever went away, but it’s definitely newsworthy this year. 300,000 of Panini’s World Cup stickers were stolen in Brazil while one Colombian teacher is alleged to have used some confiscated from students to enhance his/her own personal collection.

Ampp3d recently published an article on the cost of filling the 2014 album if (and that’s a big if) you don’t swap spares with your mates (or complete strangers). They quote an article by Simon Whitehouse from 2010 suggesting the average cost is about £410. Reading the comments from that article, it’s actually more like £450. Either way it’s a staggering amount. But, as always, the mean doesn’t tell the whole story. Rather than attempt to cover the (quite complex) maths involved I thought I would (Monte Carlo) simulate the process – 10,000 times – and see what can be learnt from the results.

The simulation

The assumptions for the simulations are basically that:

1. There are 640 unique stickers to collect;
2. Each packet has five stickers and costs 50p;
3. All stickers are distributed randomly;
4. The complete pool of stickers is infinite: collecting a sticker once does not reduce the possibility of collecting it again;
5. The same sticker can even occur twice in one pack (it’s not clear whether this assumption is true but it makes little difference to the odds anyway);
6. Stickers are collected until the collection includes at least one of every sticker.

The R code for the simulation can be found in the bonus material at the bottom if you’re interested.

The bottom line

The chart below shows the distribution of costs for collecting the entire album.

The mean of this distribution is £452 but we can see that there is a pronounced high-expenditure tale; while it’s more likely you’ll pay less than £450 to collect the album this way, there’s a real possibility you could spend a lot more. It’s probably clearer if we plot the probability of whether the album remains uncompleted given a specific expenditure:

From this chart we can see there’s nearly a 25% chance you’ll end up spending more than £500, a 5% chance it’ll cost you £600 or more and a 1% chance that £700 isn’t quite enough!

Familiar faces

I mentioned Terry Hurlock in the introduction because I can still remember the frustration of getting a third and fourth copy of his sticker before I’d managed to collect one of a significant fraction of others. It’s this kind of occurrence, as also referenced in the Ampp3d article, that gives the impression that – excuse the pun – the cards are stacked against you. The chart below shows the distribution of the number of copies collected of the most-collected card using the 10,000 simulations:

From this it’s clear to see that, in order to collect at least one of every one of the 640 stickers we should expect to have collected a double-figured number of at least one sticker. This fact wasn’t obvious to me before I ran the simulations. Perhaps we should go easy on Panini next time we think they’ve rigged things against us?

Finishing the job

It should be obvious that we get diminishing returns on our investment as we approach completion of the sticker album. It might not be obvious just how much it might cost to get that final sticker. This chart makes it clear:

The maths behind this is relatively simple but, in short ,we have an exponential decay and a mean expenditure of £64 for the final sticker. However, there’s still a nearly 25% chance that the expenditure on the final sticker will reach three figure and a 1% chance it could cost as much as £300!

You want more?

After a suggestion on Twitter, I wrote a follow-up post to this one, showing how costs can be reduced if you don’t mind “cheating” a little.

Bonus material: the code

Below is the R function used to generate the data that was then analysed above. Each trials simulation adds two vectors of data: The first stores the number of times each card was collected, the second stores the number of unique cards collected after each pack is “purchased”. With the default arguments the code takes ~10 minutes to run on my laptop and the result can be saved as a binary file around 11 MB in size.