<<- <- Assignments in R -> ->>

As the title suggests, this tutorial covers assignments in R. That is, it covers how we can assign (or bind) a value (or an object) to a name to create or update a variable.

The <- and = operators

If you come to R from almost any any other language, one of the first things you’ll probably notice is the prevalence of the pair of characters <-. Since you’re here you probably know already that, in combination, this is R’s slightly quirky assignment operator. Or rather, it’s one of R’s assignment operators. R has 5 such operators. There’s a fair chance you are also familiar with the = assignment operator. In the following example the two are exchangeable:

x <- 5
y = 5
identical(x, y)
## [1] TRUE

Clearly, since the two results are the same, it’s a matter of personal preference which you choose to use in this case. If you’ve come to R from a language like C or JavaScript (or to be honest, most other programming languages I’m aware of) the equals will seem more intuitive. And, of course, it’s one less character to type. But the <- combination does at least make it more explicit which direction the assignment works in.

The function definitions below are also all equivalent, each giving the same numerical solutions to quadratic equations specified (with decreasing powers) by the constants/parameters A, B and C.

quadSolve <- function(A, B, C){
   if(A == 0){return(c(-C/B))}
   surd <- sqrt(B^2 - 4*A*C)
   denom <- 2*A
   return((-B + c(surd,-surd))/denom)
}

quadSolve = function(A, B, C){
   if(A == 0){return(c(-C/B))}
   surd = sqrt(B^2 - 4*A*C)
   denom = 2*A
   return((-B + c(surd,-surd))/denom)
}

quadSolve = function(A, B, C){
   if(A == 0){return(c(-C/B))}
   surd <- sqrt(B^2 - 4*A*C)
   denom <- 2*A
   return((-B + c(surd,-surd))/denom)
}

At this point you’d be forgiven for thinking that there’s no difference between <- and =. But there is; ?assignOps tells us so:

The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.

I’ve read this a number of times and still don’t find it particularly helpful. So I’m going to highlight differences by example instead.

<- and = for default-value associations

One of the other places we see the = operator is when “assigning” (sometimes referred to as “associating” in this context) a default value to an argument of a function. For example:

quadSolve = function(A, B, C=0){
   if(A == 0){return(c(-C/B))}
   surd <- sqrt(B^2 - 4*A*C)
   denom <- 2*A
   return((-B + c(surd,-surd))/denom)
}

Now we can omit the third argument when calling quadSolve if we want the constant term to be 0. That is:

identical(quadSolve(2,1,0), quadSolve(2,1))
## [1] TRUE

If, however, you try to use <- to associate a default value with a parameter, you’ll get an error.

quadSolve = function(A, B, C<-0){
   if(A == 0){return(c(-C/B))}
   surd <- sqrt(B^2 - 4*A*C)
   denom <- 2*A
   return((-B + c(surd,-surd))/denom)
}
## Error: <text>:1:29: unexpected assignment
## 1: quadSolve = function(A, B, C<-
##                                 ^

<- and = inside function calls

When we called the working quadSolve function above with two or three arguments, the values passed in were assigned to the function parameters by position: the first argument is assigned to the first parameter (A), the second to the second (B) and the third to the third (C). If we wanted to though, we could explicitly associate an argument with a parameter by name:

identical(quadSolve(2,1,0), quadSolve(C = 0, B = 1, A = 2))
## [1] TRUE

This is different to, for example, C++ or (modern) JavaScript where default values are permissible but we cannot vary the order. It allows for greater flexibility, especially when a function has a large number of arguments, at least some of which have default values.

What if we use <- when we call the function? If we supply the arguments in order then at a glance it may appear that it’s equivalent to using =:

identical(quadSolve(A = 2, B = 1, C = 0), quadSolve(A <- 2, B <- 1, C <- 0))
## [1] TRUE

It isn’t, although in this case the result of the function call is unchanged. Not so if we shuffle the ordering around:

identical(quadSolve(C = 0, B = 1, A = 2), quadSolve(C <- 0, B <- 1, A <- 2))
## [1] FALSE

In both cases we’ve actually assigned (or perhaps reassigned), variables to the surrounding environment prior to executing the second argument of identical. In the case of quadSolve(C <- 0, B <- 1, A <- 2), it’s functionally equivalent to

C <- 0
B <- 1
A <- 2
quadSolve(A = C, B = B, C = A)
## [1] -2

which is, to say the least, confusing.

Assignment using <- in a function call is occasionally useful. The function outer takes a pair of vectors and creates a matrix by applying a function to each possible pair of elements from the two vectors. By default, the function applied to each pair is multiplication:

outer(1:10, 1:10)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    3    4    5    6    7    8    9    10
##  [2,]    2    4    6    8   10   12   14   16   18    20
##  [3,]    3    6    9   12   15   18   21   24   27    30
##  [4,]    4    8   12   16   20   24   28   32   36    40
##  [5,]    5   10   15   20   25   30   35   40   45    50
##  [6,]    6   12   18   24   30   36   42   48   54    60
##  [7,]    7   14   21   28   35   42   49   56   63    70
##  [8,]    8   16   24   32   40   48   56   64   72    80
##  [9,]    9   18   27   36   45   54   63   72   81    90
## [10,]   10   20   30   40   50   60   70   80   90   100

In this case, because the first and second arguments in our call are the same, if we want to be succinct we can use <- assignment inside the function call and only declare one vector.

outer(x <- 1:10, x)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    3    4    5    6    7    8    9    10
##  [2,]    2    4    6    8   10   12   14   16   18    20
##  [3,]    3    6    9   12   15   18   21   24   27    30
##  [4,]    4    8   12   16   20   24   28   32   36    40
##  [5,]    5   10   15   20   25   30   35   40   45    50
##  [6,]    6   12   18   24   30   36   42   48   54    60
##  [7,]    7   14   21   28   35   42   49   56   63    70
##  [8,]    8   16   24   32   40   48   56   64   72    80
##  [9,]    9   18   27   36   45   54   63   72   81    90
## [10,]   10   20   30   40   50   60   70   80   90   100

This makes the code DRYer and it’s now a little easier and less error prone if we want to expand our calculation (for example, changing 1:10 to 1:100).

Be warned: this kind of shortcut has (at least) 2 major problems.

Firstly, we are polluting the global environment here. If x didn’t exist before, it does now. Worse, if x already exists in the global environment its value will be changed:

x <- 4
invisible(outer(x <- 1:10, x))
print(x)
##  [1]  1  2  3  4  5  6  7  8  9 10

The second catch is that, to save on unnecessary computations, function arguments are evaluated lazily. This means that, if an argument is not used by the function, no assignment will take place.

contrived <- function(x, y){
   if(x > 5){
      print("Too big")
   } else{
      print(y)
   }
} 

contrived(3, a <- 42)
## [1] 42
contrived(17, a <- 1066)
## [1] "Too big"
print(a)
## [1] 42

Keeping it local

We can get around the first of these two problems - polluting the environment — by enclosing the call to to our function in a call to the function local:

x <- 4
y <- local(outer(x <- 1:10, x))
print(x)
## [1] 4

Now the assignment x <- 1:10 takes place in a new environment created by local and doesn’t reassign to the global x variable. outer is executed with the local value of x and local returns the value returned by outer.

print(y)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    3    4    5    6    7    8    9    10
##  [2,]    2    4    6    8   10   12   14   16   18    20
##  [3,]    3    6    9   12   15   18   21   24   27    30
##  [4,]    4    8   12   16   20   24   28   32   36    40
##  [5,]    5   10   15   20   25   30   35   40   45    50
##  [6,]    6   12   18   24   30   36   42   48   54    60
##  [7,]    7   14   21   28   35   42   49   56   63    70
##  [8,]    8   16   24   32   40   48   56   64   72    80
##  [9,]    9   18   27   36   45   54   63   72   81    90
## [10,]   10   20   30   40   50   60   70   80   90   100

The use of local can be convenient for code-writing. Changing a value in one place is quicker and less error prone than changing it in two. Execution time my increase a little, however, and it does nothing to alleviate the second problem with assignments in function calls.

Lazy-evaluation issues

Earlier I noted that lazy evaluation of function arguments can cause problems with assignments because it’s not guaranteed that an assignment will take place. The problem is actually worse than that. Consider the following contrived function:

contrived <- function(x, y, z){
   if(x > 5){
      print(y)
   }
   print(2*z)
}

Suppose we want to use an assignment so that the second and third arguments match. If the argument associated with parameter x is bigger than 5 we shouldn’t have a problem (if we’re not worried about polluting the global environment).

contrived(10, a <- 42, a)
## [1] 42
## [1] 84

If, however, we associate a value with the x parameter that is less than 5 then we will have a problem. If the variable name isn’t in scope we’ll get an error:

contrived(3, newVar <- 42, newVar)
## Error in print(2 * z): object 'newVar' not found

If it is in scope we may get an error…

definedVar <- "Hi"
contrived(3, definedVar <- 42, definedVar)
## Error in 2 * z: non-numeric argument to binary operator

…or we could see a “wrong” answer printed out:

definedVar <- 10
contrived(3, definedVar <- 42, definedVar)
## [1] 20

Probably worst of all, we could get the right answer for the wrong reason:

getFavouriteNumber <- function(){return(336)}
anotherVar <- getFavouriteNumber()
definedVar <- anotherVar/8
contrived(3, definedVar <- 42, definedVar)
## [1] 84

In this (highly contrived) case, the problem could go unnoticed until I decide my favourite number is something other than 336.

getFavouriteNumber <- function(){return(7)}
anotherVar <- getFavouriteNumber()
definedVar <- anotherVar/8
contrived(3, definedVar <- 42, definedVar)
## [1] 1.75

This erratic and brittle behaviour persists even if we use local, since the definedVar in the global environment remains in scope:

definedVar <- 10
local(contrived(3, definedVar <- 42, definedVar))
## [1] 20

If you “own” the function in question you can prevent this issue from happening using force (and lose any speed gains that may have been available thanks to lazy evaluation):

contrived <- function(x, y, z){
   if(x > 5){
      print(y)
   } else{
      force(y)
   }
   print(2*z)
}
definedVar <- 10
contrived(3, definedVar <- 42, definedVar)
## [1] 84
contrived(3, newVar <- 42, newVar)
## [1] 84

If the function isn’t yours then assignment in a call to it is inherently risky unless you understand the function’s internals and know they won’t change.

<- and = in if conditions:

Let’s see what happens if we use the <- operator to make an assignment in the condition part of an if statement.

x <- 0
if(x <- 7){
   print("Yes!")
} else{
   print("Nope!")
}
## [1] "Yes!"
print(x)
## [1] 7

The (re)assignment takes place before the condition is evaluated. In this case, the condition is now true (or “truthy”) and the associated block is evaluated while the else clause is ignored.

What happens if we use the = assignment operator?

x <- 0
if(x = 7){
   print("Yes!")
} else{
   print("Nope!")
}
## Error: <text>:2:6: unexpected '='
## 1: x <- 0
## 2: if(x =
##         ^

I’m not sure if this was the intent or a happy coincidence, but the error in this case prevents the common problem seen in other languages where the assignment operator, =, is used when the equality operator, ==, was intended.

One last complication. What kind of output do you expect to see from the following code snippet?

x <- 0
if(x < - 7){
   print("Yes!")
} else{
   print("Nope!")
}
print(x)

If you answered “the same as the example before last” you’d be wrong:

## [1] "Nope!"
## [1] 0

I sneakily added a space between the < and - inside the if condition. Rather than assigning to x and checking whether x is truthy, it now just checks whether the value of x, 0, is less than -7. Since it isn’t, the else block is executed.

The <<- operator

Now we’ve seen the difference between = and <-, what about <<-? Consider the following:

x <- 10
y <<- 10
identical(x, y)
## [1] TRUE

As with the first example for <- and =, there appears to be no difference in the simplest of cases. However, <- will always make an assignment in the current environment while <<- will redefine a variable, if it exists, in the parent environment or keep looking through enclosing environments to find the appropriate variable and redefine it. Eventually, assignment will take place in the global environment if necessary. So in this case, <- and <<- basically do the same thing because the assignment is done from the global environment to begin with. Differences are only seen when we use other environments, either explicitly through environment, or more commonly through functions.

Here’s an illustrative (but fairly useless) example of how <- and <<- differ.

daftFunc <- function(){
   x <- 7
   x <<- 42
   y <<- 36 
   print(paste("Inside func, x =", x))
   print(paste("Inside func, y =", y)) #y is actually stored outside the function
}

x <- 0
daftFunc()
## [1] "Inside func, x = 7"
## [1] "Inside func, y = 36"
print(paste("Outside func, x =", x))
## [1] "Outside func, x = 42"
print(paste("Outside func, y =", y))
## [1] "Outside func, y = 36"

While the code above is legal, that doesn’t make it “good”. It’s brittle, not very portable and hard to reason about.

Counting the number of times a function has been called

Let’s suppose we want to keep track of how many times some arbitrary function is called. In R this is fairly straightforward because the language makes it easy to pass functions to other functions and return functions from functions. Here’s a wrapper function that doesn’t quite fit the bill:

counter <- function(func){
   funcName <- as.list(sys.call())[[2]] #inelegant way of getting name of func
   count <- 0
   return(
      function(...){
         count <- count + 1 #Here!
         print(paste(
            "You have called",
            funcName,
            count,
            ifelse(count>1, "times", "time"),
            "through this wrapper."
         ))
         func(...)
      }
   )
}

If we use counter to wrap sqrt, for example, we’ll get the right value returned but the value of counter never gets passed 1.

sqrtCount <- counter(sqrt)
for(i in 1:10){
   print(sqrtCount(i))
}
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 1
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 1.414214
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 1.732051
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 2
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 2.236068
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 2.44949
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 2.645751
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 2.828427
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 3
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 3.162278

The problem is that each time sqrtCount is called, we go to the line marked with the comment #Here. At this point in the code, the variable count from the surrounding function is still visible even though that function returned earlier (in programming terms we have a closure). However, the assignment that takes place creates a new variable called count in the environment of the inner function. What is actually happening is much clearer if we use different names for the two count variables.

counter <- function(func){
   funcName <- as.list(sys.call())[[2]] #inelegant way of getting name of func
   countOuter <- 0
   return(
      function(...){
         countInner <- countOuter + 1
         print(paste(
            "You have called",
            funcName,
            countInner,
            ifelse(countInner>1, "times", "time"),
            "through this wrapper."
         ))
         func(...)
      }
   )
}

countOuter never changes from 0, while countInner gets created and assigned the value of countOuter + 1, ie 1, at the start of each call to the inner function and destroyed at the end of that call. Thus 1 gets printed every time.

What we really want to do is update the value of count (or countOuter) in the outer function every time we call the inner function (which has been assigned to sqrtCounter in our example above). To do this we just need to use the <<- operator when we increment count:

counter <- function(func){
   funcName <- as.list(sys.call())[[2]] #inelegant way of getting name of func
   count <- 0
   return(
      function(...){
         count <<- count + 1 #count from outer scope is updated.
         print(paste(
            "You have called",
            funcName,
            count,
            ifelse(count > 1, "times", "time"),
            "through this wrapper."
         ))
         func(...)
      }
   )
}

Now sqrtCount does correctly count how many times sqrt is called through the wrapper:

sqrtCount <- counter(sqrt)
for(i in 1:10){
   print(sqrtCount(i))
}
## [1] "You have called sqrt 1 time through this wrapper."
## [1] 1
## [1] "You have called sqrt 2 times through this wrapper."
## [1] 1.414214
## [1] "You have called sqrt 3 times through this wrapper."
## [1] 1.732051
## [1] "You have called sqrt 4 times through this wrapper."
## [1] 2
## [1] "You have called sqrt 5 times through this wrapper."
## [1] 2.236068
## [1] "You have called sqrt 6 times through this wrapper."
## [1] 2.44949
## [1] "You have called sqrt 7 times through this wrapper."
## [1] 2.645751
## [1] "You have called sqrt 8 times through this wrapper."
## [1] 2.828427
## [1] "You have called sqrt 9 times through this wrapper."
## [1] 3
## [1] "You have called sqrt 10 times through this wrapper."
## [1] 3.162278

Memoization

Another great use for <<- is memoization. Consider a function for producing the nth number in a Fibonacci sequence whose first two terms are t1 and t2. That is, any term after the second is the sum of the previous two terms. We could do something like this:

fib <- function(n, t1 = 1, t2 = 1){
   if(n <= 0){
      stop("n must be bigger than 0!")
   }
   if(n == 1){return(t1)}
   if(n == 2){return(t2)}
   last <- t1
   current <- t2
   nCurrent <- 2
   while(nCurrent < n){
      nCurrent <- nCurrent + 1
      tmp <- current
      current <- last + current
      last <- tmp
   }
   return(current)
}

myseq <- seq(from = 5, to = 25, by = 5)
for(i in seq_along(myseq)){
   print(fib(myseq[i]))
}
## [1] 5
## [1] 55
## [1] 610
## [1] 6765
## [1] 75025

In the above code, each call to fib requires that all terms in the sequence up to n are recalculated, even though when we call fib(10) we’ve actually calculated the values of fib(1), fib(2), fib(3), fib(4), fib(5) previously. This is a tedious waste that is avoidable:

fibMemo <- function(t1 = 1, t2 = 1){
   store <- c(t1, t2) #save values for future use
   
   return(
      function(n){
         
         if(n <= 0){
            stop("n must be bigger than 0!")
         }
         
         len <- length(store)
         
         if(n > len){ #calculations only required if we haven't previously calculated up to at least n
            tStore <<- c(store, rep(0, n-len)) #create temp store, padding with 0
            for(i in (len+1):n){
               tStore[i] <- tStore[i-2] + tStore[i-1]
            }
            store <<- tStore
         }
         
         return(store[n])
      }
   )
}

myseq <- seq(from = 5, to = 25, by = 5)
fib11 <- fibMemo() #you can pass values for t1 and t2 if you don't like the defaults
for(i in seq_along(myseq)){
   print(fib11(myseq[i])) #fib11 only takes one argument, n
}
## [1] 5
## [1] 55
## [1] 610
## [1] 6765
## [1] 75025

Using fib11, each value of the series only gets calculated once, at most. The store in the outer function initially only holds the first two values assigned when fibMemo is called. When the n passed in to the function returned by fibMemo is larger than the current length of store, all the missing numbers up to and including the nth are calculated and added to the store. When the value of n is smaller than the current length of the store, the return value can just be looked up. This makes executing the for loop at the bottom (without the printing step) about five times quicker than with fib. Subsequent fib11(25) calls (which only require a value lookup) are about 12 times quicker than fib(25) calls on my laptop.

As with the counter function above, none of this would be possible without the <<- operator that updates the store vector in the environment of the surrounding function.

<<- in silly places

You can, if you are so inclined, use the <<- assignment operator when passing arguments to a function, as in this completely stupid example:

x <- 4
y <- local(outer(x <<- 1:10, x))
print(x)
##  [1]  1  2  3  4  5  6  7  8  9 10

I can’t think of a good reason for actually doing this.

You can also use <<- as part of an if condition. What output do you expect to see from the following code?

daftFunc <- function(){
   x <- 0
   if(x <<- 7){
      print("Yes!")
   } else{
      print("Nope!")
   }
   print(paste("x at the end of daftFunc =", x))
}

x <- 0
print(paste("x before daftFunc =", x))
daftFunc()
print(paste("x after daftFunc =", x))

If you guessed…

## [1] "x before daftFunc = 0"
## [1] "Yes!"
## [1] "x at the end of daftFunc = 0"
## [1] "x after daftFunc = 7"

then you are particularly clever! (I didn’t.)

On the plus side, if you accidentally insert a space between any of the characters in <<- you’ll just get an error.

The -> and ->> operators

-> and ->> do much the same thing as their left-pointing relatives, except the value is positioned on the left and the name being assigned to on the right.

daftFunc <- function(){
   7 -> x
   42 ->> x
   36 ->> y 
   print(paste("Inside func, x =", x))
   print(paste("Inside func, y =", y)) #y is actually outside the function
}

0 -> x
daftFunc()
## [1] "Inside func, x = 7"
## [1] "Inside func, y = 36"
print(paste("Outside func, x =", x))
## [1] "Outside func, x = 42"
print(paste("Outside func, y =", y))
## [1] "Outside func, y = 36"

You can also assign in both directions at once… should you really want to:

x <- 7 -> y
print(paste("x =", x))
## [1] "x = 7"
print(paste("y =", y))
## [1] "y = 7"

Precedence

The assignment operators don’t all have the same precedence. -> and ->> have higher precedence than <- and <<- which, in turn, have higher precedence than =.

The perhaps not entirely intuitive result of this is that the following is fine…

x = y <- 7 -> z
print(x)
## [1] 7
print(y)
## [1] 7
print(z)
## [1] 7

…but this gives an error:

x <- y = 7
## Error in x <- y = 7: could not find function "<-<-"

The assign function, backticks and non-syntactic names

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as “.2way” are not valid, and neither are the reserved words.

You’re probably already aware of most of most of the information in the above quote (source). What might not be obvious is that names don’t have to be syntactically valid! In short, you can create some pretty ridiculous names if you really really want to by using the assign function. The snippet below is one such example, with get used to retrieve the value:

assign("1", 2)
print(paste('2 + 1 =', 2 + get("1")))
## [1] "2 + 1 = 4"

Instead of assign and get, we can use backticks to both assign a value to and retrieve a value from a non-syntactic name:

`7` <- 42
print(paste('The answer to life, the universe and everyting is', `7`))
## [1] "The answer to life, the universe and everyting is 42"

We can mix and match as well:

assign("1", 2)
print(paste('2 + 1 =', 2 + `1`))
## [1] "2 + 1 = 4"
`7` <- 42
print(paste('The answer to life, the universe and everyting is', get("7")))
## [1] "The answer to life, the universe and everyting is 42"

assign can actually be useful if you want to create syntactically valid names too. For example we can create variables based on the column names of a data.frame, even when we don’t know which data frame will be used and what its column names are:

createVars <- function(df, func, prefix = "", suffix = ""){
   for(name in names(df)){
      assign(
         paste0(prefix, name, suffix),
         func(df[[name]]),
         envir = .GlobalEnv
      )
   }
}

Names are added to the global environment rather than the function’s own environment using the optional envir parameter in the call to assign. Hopefully you’ll never need to use code as ugly as this, but here it is in action:

createVars(iris[,1:4], median, suffix = ".Median")
for(obj in ls(pattern="\\w+\\.\\w+\\.Median")){
   print(paste(obj, "=", get(obj)))
}
## [1] "Petal.Length.Median = 4.35"
## [1] "Petal.Width.Median = 1.3"
## [1] "Sepal.Length.Median = 5.8"
## [1] "Sepal.Width.Median = 3"
createVars(anscombe, mean, "\u00b5")
for(obj in ls(pattern="\u00b5[x|y]\\d")){
   print(paste(obj, "=", get(obj)))
}
## [1] "µx1 = 9"
## [1] "µx2 = 9"
## [1] "µx3 = 9"
## [1] "µx4 = 9"
## [1] "µy1 = 7.50090909090909"
## [1] "µy2 = 7.50090909090909"
## [1] "µy3 = 7.5"
## [1] "µy4 = 7.50090909090909"

Special binary operators

Recall the outer function we’ve used several times already. Here’s an example where the first and second arguments differ:

outer(1:9, 3:5)
##       [,1] [,2] [,3]
##  [1,]    3    4    5
##  [2,]    6    8   10
##  [3,]    9   12   15
##  [4,]   12   16   20
##  [5,]   15   20   25
##  [6,]   18   24   30
##  [7,]   21   28   35
##  [8,]   24   32   40
##  [9,]   27   36   45

As noted earlier, if we don’t provide a third argument then the default argument for the third parameter, FUN, is * — multiplication. The base package also defines a binary operator that does exactly the same thing.

(1:9)%o%(3:5)
##       [,1] [,2] [,3]
##  [1,]    3    4    5
##  [2,]    6    8   10
##  [3,]    9   12   15
##  [4,]   12   16   20
##  [5,]   15   20   25
##  [6,]   18   24   30
##  [7,]   21   28   35
##  [8,]   24   32   40
##  [9,]   27   36   45

Now let’s suppose that, instead of a matrix, we want a vector containing only one copy of each unique value, sorted in ascending order. Either of the following will do:

sort(unique(as.vector(outer(1:9, 3:5))))
##  [1]  3  4  5  6  8  9 10 12 15 16 18 20 21 24 25 27 28 30 32 35 36 40 45
sort(unique(as.vector((1:9)%o%(3:5))))
##  [1]  3  4  5  6  8  9 10 12 15 16 18 20 21 24 25 27 28 30 32 35 36 40 45

If we want to do this sort of calculation often then we should probably create a function:

uProd <- function(a, b){sort(unique(as.vector(a%o%b)))}
uProd(1:9, 3:5) 
##  [1]  3  4  5  6  8  9 10 12 15 16 18 20 21 24 25 27 28 30 32 35 36 40 45

Since our function only relies on two arguments, it would be nice if we could use it as a binary operator. Thankfully we can, using backticks and the special binary operator naming syntax that takes the form %characters%:

`%X%` <- function(a, b){sort(unique(as.vector(a%o%b)))}
(1:9) %X% (3:5)
##  [1]  3  4  5  6  8  9 10 12 15 16 18 20 21 24 25 27 28 30 32 35 36 40 45

The variable to the left of our new operator maps to the first argument of the associated function while the number to the right maps to the second argument.

If you prefer, you can use assign to to create the new operator:

assign("%X%", function(a, b){sort(unique(as.vector(a%o%b)))})
(1:9) %X% (3:5)
##  [1]  3  4  5  6  8  9 10 12 15 16 18 20 21 24 25 27 28 30 32 35 36 40 45

Curiously, you can use binary operators (ordinary and special, built-in and user-defined) like functions, should you want to. You just need to wrap the names in backticks:

`+`(3, 4)
## [1] 7
`%%`(5, 3) #modulo operation
## [1] 2
`%X%`(1:9, 3:5) 
##  [1]  3  4  5  6  8  9 10 12 15 16 18 20 21 24 25 27 28 30 32 35 36 40 45

Reassigning built-in functions and operators

Given we already know…

x <- 10
y <<- 10
print(x == y)
## [1] TRUE

…you’d probably think that if this is fine…

c <- 7
c = 7

…then this should be fine too, right?

c <<- 7

Wrong!

## Error in eval(expr, envir, enclos): cannot change value of locked binding for 'c'

c is, of course, the base function for combining vectors or lists. Because of this, its binding is locked. Hence, <<- cannot change it. <- (and =), however, ignores the lock (!!!) and can. You’ll have the same problem with the transpose function t, too.

Is this sensible behaviour? Reassigning a simple value to c like 7 isn’t a major problem. R can usually tell whether you want the value or the function by how you use it. It’s also an easy “mistake” to make unwittingly. I did it with quadSolve when creating this article. (To avoid (premature) confusion I changed to using capital letters for the parameters though it wasn’t strictly necessary.) But if you reassign c to another function things could be more problematic. You can still retrieve the base function using :: if need be…

c <- function(){42}
c()
## [1] 42
#c(1:4, 6:9) #gives error about unused argument if uncommented
base::c(1:4, 6:9)
## [1] 1 2 3 4 6 7 8 9

but this is an obvious pain. You can remove an accidental binding to a “locked” variable using the regular rm command.

return42 <- c
rm(c)
print(c(1:4, 6:9))
## [1] 1 2 3 4 6 7 8 9

This difference between <- and <<- extends to built-in operators. Yep, you can make + subtract!

`+` <- function(a, b){a-b}
3 + 2
## [1] 1

Aaaaarghh!

In case you were wondering (I was), this doesn’t work

3 base::`+` 2
## Error: <text>:1:3: unexpected symbol
## 1: 3 base
##       ^

but this does…

base::`+`(3, 2)
## [1] 5

So that’s alright then. Umm…

rm(`+`)
3 + 2
## [1] 5

Pheww!

This all seems like pretty weird behaviour. It certainly has the potential to cause a lot of pain. If you’ve ever use the ggplot2 library, however, you’ll have used this feature repeatedly, probably without giving it all that much thought.

Summary

You’re probably thinking by now that assignments in R can get pretty complicated. Well, I’m thinking that anyway. For the most part, it doesn’t have to be that way though.

For “regular” assignments directly in to the global environment, <- and = are interchangeable. Assignments using <- in function calls are fraught with danger and assignments as part of an if condition seem unnecessary. One possibility, therefore, is to avoid using <- (and ->) altogether. You do, however, need to remember that <- exists so that you can read other people’s code and tell the difference between x <- 10 and x < - 10.

Whether you choose to stop using <- or not, <<- is still useful for logging and memoization. ->> can probably be safely forgotten about.

Backticks and/or assign are useful for creating your own special binary operators.

Further reading

This stackoverflow discussion provided much of the inspiration for this tutorial. So thanks to everyone involved.

I’ve skimmed over environments in this tutorial. I’ve assumed you don’t need a thorough understanding of what is meant by them here, only a general idea of the concept of scope. Hadley Wickham’s Advanced R book has a whole chapter on them if you feel like you need to learn more. The same book also briefly covers closures.

Paul Teetor’s R Cookbook (the first R book I ever read and a regular reference for my work) helped with the section on special binary operators.