Putting web workers to work

Background

Earlier in the year I wrote several posts (here, here and here) about the cost of collecting football stickers. These posts used the results of Monte Carlo simulations run using R. However, I though it would be interesting to use an interesting and fairly recent addition to web browsers to allow readers to run similar simulations for themselves. This is more about showcasing a cool feature of modern web browsers that can assist with statistical simulation than it is about football stickers (but hopefully you might find that topic interesting anyway).

The simulator

As noted above, the basic premise for the simulations come from collecting football stickers (and hence controls are labelled accordingly). However, it is more broadly a simulator of the coupon collector’s problem, with limits enforced by the ranges of the controls.

The four sliders above the Simulate button below allow you to set four parameters prior to simulation. The two sliders below the graphic allow a further two parameters to be adjusted without the requirement that the simulation be re-run.

The prior-to-simulation sliders set the total number of stickers to collect, the number of stickers in a single packet, the number of stickers that can be purchased directly to complete a collection and the number of trials to run in the simulation (ie how many times do we run through and collect each sticker?). The penultimate slider is meant to mimic the option Panini offer to directly purchase the final few stickers in a collection. As noted previously, this can save the collector a lot of money.

The post-simulation sliders allow you to control costing. The “Price of pack” slider should be fairly self-explanatory while the “Price of individual stickers” relates to the directly purchased stickers noted in the previous paragraph and thus has no effect when the number of stickers that can be bought is 0. (No attempt is made to find the most cost-effective strategy to use when specific stickers can be purchased, it is just assumed that those stickers will be purchased when the number of stickers remaining to collect is less than or equal to the number of stickers that can be bought directly.)

So adjust the sliders, click “Simulate” and wait. The histogram tallies the total cost over each trial while the table provides some basic details about the (unbinned) distribution as a whole.

Unfortunately your current browser does not support web workers so simulations cannot be run in this browser. If you'd like to update to a browser that does support web workers then see here. You can still adjust pricing with the default simulated data if you wish.

Number of unique stickers (N):

Number of stickers in pack (P):

Stickers that can be bought (B):

Number of trials (T):

Price of pack:

Price of individual stickers:

Measure Value
Median
Mean
Standard deviation
Skewness
Excess kurtosis

Unfortunately this demonstration will not work in your current browser because it doesn't support SVG.

What’s going on?

The JavaScript code required to run the simulations is both simple and slow. (Feel free to suggest ways of improving speed, the code is below). Depending on the number of cards to collect, the number of trials run, your computer and the browser you’re using (Internet Explorer seems to be particularly slow) it can take minutes. That means executing the code “normally” could make the browser unresponsive for that length of time (and you may get warnings about an unresponsive script).

Instead, the heavy duty work is delegated to a web worker which has it’s own execution thread. The “main” thread sets up a worker thread, passes a message to it and listens for response messages… but continues to act normally otherwise. The worker thread does the grunt work and passes messages back (the results of calculations) at regular intervals and on completion of the task it has been set so the user gets frequent reassurances things are still running smoothly and nothing has crashed. When the main thread receives the data messages from the worker it updates the chart and table accordingly. These worker updates are programmed to come on the completion on every 500 trials of the simulation and the corresponding code run by the main thread typically takes  just a few 10’s of milliseconds to run.

If you’re curious, the full code for the main thread can be found here. The snippet below shows how a web worker is set up when the Simulate button is pressed and how messages passed back from the worker are dealt with.

The worker thread itself is shown in the code below.

Chart design

The chart above is, as you may be able to tell, based on one created previously (for the original stickers blog post) using Hadley Wickham’s ggplot2 package in R. I used the gridSVG package to export that in SVG format and then rebuilt the SVG using d3.js, generalising the construction and reconstruction of grid lines, axes and bars to cope with changes in the data.

Leave a Reply

Your email address will not be published. Required fields are marked *