Primetime: using advanced features to create fast-R code

Introduction

I’ll begin by stating that I am by no means an expert in program optimisation. If you are and you cringe at the things I write in this post then feel free to let me know. I won’t be offended. Also, if speed is absolutely of the essence then R is probably not the language for you. Rather, the purpose of this post is to show, using a case study, how one might speed up R code to get a slower task done much quicker using some of the more advanced features in R. This post was part inspired by reading Hadley Whickham’s excellent Advanced R site and partly inspired by attempting some of the challenges on the Project Euler website. Of course, any bugs in the code that follows are entirely my own doing.

The basic goal is this: write an R function that enables the user to quickly determine whether each of a potentially large number of potentially large numbers are prime or composite numbers. We’ll begin with the obvious then gradually move on to more advanced code (edge-case bugs may come and go along the way). Hopefully, however, things should remain relatively intuitive. While illustrating by example, some of the improvements will be applicable to other problems one might face using R.

The base: divide and conquer

The most obvious method to test if a number, N, is a prime is to test for divisors – an approach known, for obvious reasons, as trial division. Naively one may test for any divisors between 2 and N-1; slightly less naively we could test numbers up to N/2. But for every factor a number has above its square root, it has a compatriot below the square root. For example, the factors of 24 (square root ~4.9) are 1 and 24 (obviously), 2 and 12, 3 and 8 and 4 and 6. So to show if a number, N, is a prime we only have to test if any integer between 2 and the root of N divide exactly in to it. This makes things a lot quicker. Of course if N is even and bigger than 2 we need not go to any effort to determine it isn’t a prime. Similarly, there’s little point looking for integers by dividing an odd number by an even one. So, if N is odd, we just need to do trial division with the other odd numbers above 1 and up to the root of N.

It’s simple enough to write some R code that implements this:

This code works but is it fast? For repeated testing purposes here are four datasets of random data:

These sets contain, in order, what could loosely be describes as a small set of small numbers, a large set of small numbers, a small set of large numbers and a large set of large numbers (though the last two sets also contain small numbers).

We can use the microbenchmark package (though it’s really meant for small chunks of code) to repeatedly test how quickly the function can check these arrays, for example:

The use of sapply (or similar) is required because the code isn’t vectorised (more on that shortly). For the da dataset the median total time (from 30 trials) to compute whether each value in the vector is a prime is just over 350 ms on my 2011 MacBook Pro. This is bearable, but for dataset db we’re looking at a time of 35-40 s while the smaller dc set takes about 3.4 s. Clearly we want to improve on this.

Step 1: use a bit of maths and cache results

The fundamental theorem of arithmetic tells us that every integer greater than 1 is either a prime or can be represented as a product of primes. Because of this it is sufficient to establish that a number is not divisible by any prime number equal to or less than its square root to determine it is a prime.

To show how far we’ve come from our earliest considerations above, let’s consider the reasonably large number 999,983 – it’s the largest prime below 1 million. To check this by taking the most naive of approaches to trial division we’d try all 999,981 integers from 2 to 999,982. A more reasonable approach, all odd integers from three up to the (floor of the) square root of 999,982 – 999 – would reduce this to 499 trial divisions. But if we do trial division using only prime numbers below 1000 this reduces further, to just 168.

Of course this would require knowledge of primes below 1000. We can get them by using trial division. But then have we really progressed? Well, actually, yes. If we calculate and store a list of small primes once using trial division with odd numbers we can then use those primes to do a much more efficient trial division using only primes of larger numbers. It doesn’t take much effort to take the code for isPrime0 and create a different function that calculates all primes up to some specified number n. It should look something like this:

If this is actually going to be used effectively and efficiently then we need some “safe” place to store the results of a call to this function.

Closure time

Wikipedia has a detailed article describing exactly what a closure is in computer programming. It’s a bit verbose and unnecessarily complicated for what we need here. For our purposes, all we need to understand is that R can return a new function from inside a function and that all the variables created inside the outer function (including, potentially, more functions) can be seen by the inner, returned, function. Those additional variables can be used again and again and even changed (using the double arrow assignment operator) by the returned function.

We can utilise a closure to both store a list (well, actually a vector in R-speak) of prime numbers that our function for checking prime numbers can consult and to hold a function, such as the doTD function above, for finding new prime numbers to add to the list. This new environment is self-contained – we don’t have to worry about some external function deleting the list of primes because it won’t have direct access to it.

Putting it together

The function below utilises a closure to store the doTD function above, a set of prime numbers and a single number that marks the maximum value, used as the argument to the doTD function, to create the vector of prime numbers. For convenience all primes below 1,000 are found as part of the function initiation.

The code in the returned function is more complex than that of isPrime0. Now than we have a list of primes up to a certain number, maxPrime, we no longer need to perform any trial division on numbers less than or equal to it. We can just return TRUE or FALSE depending on whether the value is or is not in the vector of primes. For remaining numbers whose floor of the square root is below maxPrime we can return the result of trial division with those primes. Finally, for larger numbers we can try trial division with these primes. However, a result of TRUE does not necessarily mean the number is a prime since it could have a prime factor outside the vector of primes. In that case we have to recompute our list of primes and try again. The new vector of primes is stored for future reference.

We can test this new function out with datasets da and db and it won’t ever have to calculate a new set of primes. Unfortunately it’s still slow: around 190 ms seconds for the former case and a little over 20 seconds for the latter case. So only a modest improvement. For the dc set, with larger numbers, and a freshly sourced isPrime1 function we’re looking at a time of around 4.5 s, over a second slower than the original function. However, once the list of primes has been added to in the initial call to sapply, repeating the calculations is quicker, only taking around 1.7 s.

There are two obvious ways to move forward: replace the doTD function with something better and call that replacement less often. The next section tackles the first of these. After that we’ll move on to the latter.

Improvement 2: use the work of an ancient Greek

 Enter Erastothenes

The Sieve of Erastothenes is an algorithm for finding all primes up to a certain number by systematically marking off the products of smaller primes. In essence, start at 2 then mark – as not prime – all the products of 2. Then move on to the next unmarked number – 3 – and repeat. Then the next – 5 – and so on. The numbers unmarked after this process has been completed are the primes.

Here’s some R code that implements this:

Using doSieve to find prime numbers up to a specified number is much faster as this chart shows (note log-log scale):

doSieve is about an order of magnitude faster than doTD for calculating primes up to 100 and more than two orders of magnitude faster than calculating primes up to 1 million

We can swap the doTD function for the doSieve function in our prime-checking function (with a couple of minor changes):

Testing the freshly sourced isPrime2 code with the dc dataset drops the calculation time to about 1.9 s.

If you never need to check large numbers then isPrime1 and isPrime2 should be indistinguishable in speed (since this inner functions are essentially identical). However, if the vector of primes needs frequent updates then clearly isPrime2 has the edge.

Step 3: vectorisation

At the minute, if you want to determine if many numbers are prime you’d have to use isPrime2 inside a call to sapply (or a for loop). This is unnecessarily slow because:

  1. Loop and function calls are slow.
  2. The function has no way of knowing if a bigger value will shortly be checked. This leads to unnecessary calls to doSieve and is especially problematic if the numbers you want to check are large and in ascending order.

Let’s back statement 2 up with some data (again, your results will vary). From above we know the median time to find all primes in the dc data set is about 1.9 s when freshly sourced. If we source the function again and sort the data into descending order this drops to about 1.8 s (little different from when the function is executed a second time with the same data). But if we sort it into ascending order instead then running the code takes about 2.4 s as we have to call the doSieve function repeatedly.

We can do better by vectorising the code so that multiple values can be passed in at once. This is also as good a time as any to include an explicit isWhole function to test whether a number is or is not whole to within an acceptable tolerance. (As is often the case, this came from a stackoverflow question which is well worth a read if you’re confused about why this might be deemed necessary.) My initial solution looked like this:

We can now pass a single vector with – say – a million numbers to check for primality and the doSieve function will be called at most once. (Of course if you pass a vector with even bigger numbers in a subsequent call you’ll have to re-sieve.)

To my disappointment this code was still pretty slow. This isn’t helped by the presence of for loops. I was unable to vectorise these out of the code (I’m not saying that’s not possible, I just couldn’t work out how to do it).

And another result I didn’t expect: when freshly sourced the dc set would be computed in about 200 ms, but on repeated calculations this went up to about a second! This must be because the function is carrying around and subsetting a larger set of primes after the first use. For most tested numbers that isn’t really necessary.

We could of course not save the larger set of primes. But that is making a big assumption about how the function will be utilised. If, for example, it’s called frequently with small vectors of large numbers then things could get very slow. Before I try to tackle that problem, however, we’ll dispose of the for loops – or rather delegate them to another programming language altogether.

Step 4: “cheat”

As just mentioned, I was unable to get rid of the slow for loops using R code. So at this point, you may say, I cheated.

Rcpp

The Rcpp package lets you write and compile functions in C++ and then use them as if they were R functions in R. Using loops in R is notoriously slow. This isn’t the case in C++.

I first learnt to program in C++, though things may have got a bit rusty in the last year or two. Thankfully, Rcpp makes the whole experience quite pleasant. I’ll defer to Hadley Wickham for a more detailed explanation, but a few little points for those unfamiliar:

  1. Each variable needs to have an explicitly declared type (int, double, bool etc plus some really useful types defined by the Rcpp package that mimic R types);
  2. Every statement ends with a semi-colon (90% of the time this is why my C++ code fails to compile);
  3. Arrays start at 0;
  4. You must explicitly return a value at the end of a function. (I nearly always do this in R anyway since I think it makes the code clearer.)

After that, most of the other bits I needed I got by modifying examples from Whickham’s web page and answers on stackoverflow.

Anyway, this C++ function does the work of the loops in isPrime4 if you pass it your values and your list of prime numbers:

Implementing it in a new addition of isPrime looks like this:

Now the factory-fresh isPrime4 will calculate the results for the dc dataset in about 22 ms, while a repeated calculation will take about 26 ms. For the bigger dd data set we’re looking at around 2.2 s and 2.7 s, respectively (nicely in-line with the factor of 100 increase in vector size).  Generally, things are much improved: we’ve roughly gained an order of magnitude in speed from isPrime3.

Profiling the execution of isPrime4 with the profr package and the dd dataset showed that more than 50% of the execution time was spent outside the checkPrimality function. The looping is no longer dominating execution time now that we’ve introduced Rcpp.

Step 5: cheat some more

Using C++ brought back to mind one thing that I never really worry about using R: numbers need to be stored in some format or other. Most pertinently, integer types have a limit to the maximum number they can store. R actually has a specific integer type, a 32-bit signed integer, whose maximum value can be found via .Machine$integer.max. That’s 2,147,483,647. That’s also the limit to the size of the integer in the Rcpp defined IntegerVector. So if we test isPrime4 with an odd number larger than that (which R stores as a floating-point number) we get a warning and an NA value.

We can swap the “IntegerVector vals” entry in the argument list of checkPrimality with “std::vector<int64_t> vals” and then swap the “int32_t” type declarations for the val variable with an “int64_t” type declaration. Much to my surprise this actually seems to work, though I haven’t rigorously tested it. It is, however, 50% slower for the dc and dd datasets.

If we don’t care about testing really big numbers there’s an alternative that makes the code much simpler and much faster. The floor of the square root of .Machine$integer.max is “just” 46,340 and doSieve can calculate all primes up to that on my computer in just ~5 ms. It also creates a vector of just 4,792 prime numbers – not problematic for storing in memory.

This means that we can sieve just once and be guaranteed to have all the primes we need to test all numbers below our 32-bit integer ceiling. And to make things really quick we can do pre-flight checks using C++ too:

This is then used in the final (for now at least) isPrime function, isPrime5. For this I’ve added another couple of arguments. The first additional argument checks what the user wants to do with NA values (preserve them in the final output or return FALSE in their place). The second argument allows the user to specify their own tolerance level when checking whether a number is an integer. The end result is this:

Testing with dataset db produces a median calculation time of 105 milliseconds. Compared with the original isPrime0 function we’ve gained a factor of about ~350. Testing with the dd dataset gives a median time of ~0.8 s, roughly a factor of three improvement on isPrime4. As importantly, the code is somewhat simpler (I think) and there are added checks and options.

At this point I ran out of ideas for making any more noticeable changes, so now we’ll move on to comparing with pre-existing functions from several R packages I found.

Comparisons with existing packages

Sieving

The sfsmisc package provides a primes function and the numbers package provides a similarly named Primes function. Both essentially implement the Sieve of Erastothenes. These and doSieve are comparable in speed (see the chart below) but (to my slight disappointment) primes is marginally quicker. Because I don’t like losing I created doSieve2: like doSieve but with the while loop written as a C++ function. For larger input arguments, doSieve2 is about a factor of 2.5 quicker than the other functions:

doSieve, primes and primes are very similar in execution time. doSieve2 is about a factor of 2.5 quicker

Prime-checking

I found a couple of R packages with functions that allow for prime-number checking in a similar manner to isPrime5. The numbers package (see above) provides the isPrime function while the gmp package provides the isprime function. The former calls the Primes function just discussed and then uses trial division with the resulting set of primes. It’s about a factor of 2 slower than isPrime3 (but the code is somewhat neater).

The isprime function is a more complex prime-number checker. I tried to follow through the source code to understand exactly how it worked but got lost in a maze of C. The function help file, however, tells us that trial division is tried first and then followed by some number of Miller-Rabin probabilistic tests. Because the tests are only probabilistic it’s not always certain whether a candidate is or isn’t prime. Hence the function returns 0 for not prime, 1 for probably prime and 2 for definitely prime.

Rather than compare isPrime5 and is prime using the usual data sets I instead created six new ones. Each new data set contained 1 million integers between 0 and some upper limit. The values for the upper limit chosen went in powers of 10 from 100,000 (with repetitions of course) to 1 billion and then a final set that ranged up to .Machine$integer.max (ie around 2 billion). Using these data I compared isPrime5 with isprime (using three different values for the number of Miller-Rabin tests to perform). The results are plotted below (note the vertical axis is not logarithmic here):

For numbers below a billion isPrime5 is fastest. About that isprime starts to compete when the number of tests is low.

Below ~1 billion, isPrime5 is faster than isprime for any number of Miller-Rabin tests (including 0), this is generally the case with smaller vectors of numbers too. isPrime5 also gives definite answers. Having said that, with both 5 and 40 Miller-Rabin tests, all primes identified as probably prime were actually prime. The definite advantage that isprime has over isPrime is that it can deal with 64-bit integers.

Summary

Admittedly we were starting from a fairly low base to begin with, but by using some “advanced” features of R (including the ability to use another programming language entirely) and a bit of maths, we’ve managed to speed up a (I guess) fairly common task by about two orders of magnitude. The main resulting function is now, for vectors of numbers below ~ a billion, faster than the equivalent functions I found in existent packages. For larger numbers you may be better off with the isprime function in the gmp package.

There are still some open questions: Have I missed any other notable functions in existent packages? Are there any edge-cases not properly dealt with? Are there any obvious (or obscure) means to significantly speed up the code while keeping the mathematics relatively intuitive? Can isPrime5 be extended in some way to cope with 64-bit integers without suffering a huge speed penalty?

Leave a Reply

Your email address will not be published. Required fields are marked *