Background
Earlier in the year I wrote several posts (here, here and here) about the cost of collecting football stickers. These posts used the results of Monte Carlo simulations run using R. However, I though it would be interesting to use an interesting and fairly recent addition to web browsers to allow readers to run similar simulations for themselves. This is more about showcasing a cool feature of modern web browsers that can assist with statistical simulation than it is about football stickers (but hopefully you might find that topic interesting anyway).
The simulator
As noted above, the basic premise for the simulations come from collecting football stickers (and hence controls are labelled accordingly). However, it is more broadly a simulator of the coupon collector’s problem, with limits enforced by the ranges of the controls.
The four sliders above the Simulate button below allow you to set four parameters prior to simulation. The two sliders below the graphic allow a further two parameters to be adjusted without the requirement that the simulation be re-run.
The prior-to-simulation sliders set the total number of stickers to collect, the number of stickers in a single packet, the number of stickers that can be purchased directly to complete a collection and the number of trials to run in the simulation (ie how many times do we run through and collect each sticker?). The penultimate slider is meant to mimic the option Panini offer to directly purchase the final few stickers in a collection. As noted previously, this can save the collector a lot of money.
The post-simulation sliders allow you to control costing. The “Price of pack” slider should be fairly self-explanatory while the “Price of individual stickers” relates to the directly purchased stickers noted in the previous paragraph and thus has no effect when the number of stickers that can be bought is 0. (No attempt is made to find the most cost-effective strategy to use when specific stickers can be purchased, it is just assumed that those stickers will be purchased when the number of stickers remaining to collect is less than or equal to the number of stickers that can be bought directly.)
So adjust the sliders, click “Simulate” and wait. The histogram tallies the total cost over each trial while the table provides some basic details about the (unbinned) distribution as a whole.
Unfortunately your current browser does not support web workers so simulations cannot be run in this browser. If you'd like to update to a browser that does support web workers then see here. You can still adjust pricing with the default simulated data if you wish.
Measure | Value |
---|---|
Median | |
Mean | |
Standard deviation | |
Skewness | |
Excess kurtosis |
Unfortunately this demonstration will not work in your current browser because it doesn't support SVG.
What’s going on?
The JavaScript code required to run the simulations is both simple and slow. (Feel free to suggest ways of improving speed, the code is below). Depending on the number of cards to collect, the number of trials run, your computer and the browser you’re using (Internet Explorer seems to be particularly slow) it can take minutes. That means executing the code “normally” could make the browser unresponsive for that length of time (and you may get warnings about an unresponsive script).
Instead, the heavy duty work is delegated to a web worker which has it’s own execution thread. The “main” thread sets up a worker thread, passes a message to it and listens for response messages… but continues to act normally otherwise. The worker thread does the grunt work and passes messages back (the results of calculations) at regular intervals and on completion of the task it has been set so the user gets frequent reassurances things are still running smoothly and nothing has crashed. When the main thread receives the data messages from the worker it updates the chart and table accordingly. These worker updates are programmed to come on the completion on every 500 trials of the simulation and the corresponding code run by the main thread typically takes just a few 10’s of milliseconds to run.
If you’re curious, the full code for the main thread can be found here. The snippet below shows how a web worker is set up when the Simulate button is pressed and how messages passed back from the worker are dealt with.
724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 |
//jQuery selection var $simulate = $("#simulate"); $simulate.on("click",function(event){ event.preventDefault(); //Disable button $simulate.attr("disabled","disabled").attr("value","Simulating"); //Set values of sim from sliders (function defined elsewhere) prepSim(); //Instantiate a worker var worker = new Worker("/blog_extra/sim/worker.js"); //Post a message worker.postMessage(sim); //Process data when message received worker.onmessage = function(event){ data = event.data; if(data.length === sim.nTrials){ //Simulation is complete $simulate.removeAttr("disabled").attr("value","Simulate"); process(data); //update chart and table worker.terminate(); //Stop listening for worker messages } else{ //Simulation is ongoing process(data); } }; //In case there is an error for some reason let the user know worker.onerror = function(){ alert("Unfortunately the simulation could not be run"); $simulate.removeAttr("disabled").attr("value","Simulate"); }; }); |
The worker thread itself is shown in the code below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
//Run simulation when message is posted from main thread onmessage = function(event){postMessage(simulate(event.data));}; var simulate = function(props){ "use strict"; //Get properties or defaults from props object passed by main thread var nStickers = props.nStickers || 100; var nInPack = props.nInPack || 1; var nTrials = props.nTrials || 1000; var cutOff = props.cutOff || 0; var nCollect = nStickers - cutOff; var res = []; //Number of trials to loop before posting each update var updateFreq = 500; //Make sure we actually have cards to collect if(nCollect>0){ var counter; //counts packs var cardNum; //current simulated card var collected; //holds number of every card collected (in order collected) for(var i=0; i<nTrials; i++){ //Reset variables at start of trial collected = []; counter = 0; while(collected.length < nCollect){ counter++; for(var j=0; j<nInPack; j++){ cardNum = Math.floor(Math.random()*(nStickers)); if(collected.indexOf(cardNum)<0){ collected.push(cardNum); } } } //p for packets, u for uncollected (stickers) res.push({p:counter, u:nStickers-collected.length}); if(i%updateFreq===updateFreq-1 && res.length<nTrials){postMessage(res);} } } else{ //If there was no card to collect then just return 0 and nStickers for each trial var pair = {p:0, u:nStickers}; for(var i=0; i<nTrials; i++){ res.push(pair); } } return res; }; |
Chart design
The chart above is, as you may be able to tell, based on one created previously (for the original stickers blog post) using Hadley Wickham’s ggplot2 package in R. I used the gridSVG package to export that in SVG format and then rebuilt the SVG using d3.js, generalising the construction and reconstruction of grid lines, axes and bars to cope with changes in the data.