The R future Package

An overview of the R future package
Try this example in seconds on Saturn Cloud

The future package is a simple and uniform method for asynchronous computing. It provides a way of submitting functions that don’t block the current R session. It is cross-platform and works on a variety of backends.

A future is an object that represents a promise to return the value of a function when it is computed. The future itself executes almost instantly, but the function continues to process in the background on a separate R process. This delegation of execution allows the main R process to move on to other computations while the background process continues to execute.

A future consists of three parts:

  1. The execution environment
  2. The expression to the executed
  3. The status of the execution

The Execution Environment

To start using futures, first specify the execution environment. The execution environment determines where to execute the calculation defined by the future. It is defined using the plan function. Consider this example:

plan(multisession)

Note: The result of a future does not depend on what plan you choose; a future behaves consistently across plans.

There are several built-in plans for local and distributed computing:

  • sequential: A non-parallel, blocking execution environment – This is the standard plan. Think of this as the default single-threaded R process. You won’t see any performance changes by using futures in this environment, but this plan can be useful for debugging and testing.
  • multisession: A parallel, non-blocking execution environment – This plan will allow futures to utilize additional background R sessions to perform computations, which is useful on local machines where you want to utilize more than one core.
  • multicore: A parallel, non-blocking execution environment – Instead of running on background R sessions, like multisession, this plan runs on forked R processes. This approach has some speed advantages, particularly a reduction in memory overhead, but it is not stable on Windows or RStudio.
  • cluster: A parallel, non-blocking execution environment – It allows futures evaluations to run on external R sessions. It can be used on both local and remote machines.

In addition to these built-in plans, futures can also be used with batchtools, callr, or custom backends.

The Expression to Be Executed

Futures can be created either implicitly or explicitly. Both styles work to the same end but with different syntax: implicit futures are more similar to regular R code, whereas explicit futures can be clearer and easier to read.

To define an implicit future, simply replace <- with %<-%, like in the example below. The value of an implicitly defined future is returned automatically.

Note: This can conflict with packages like zeallot that use a similar notation. Use with caution.

v %<-% {
  expression
}

To define an explicit future, wrap it in the future function. For explicit futures, you need to request the value manually. Consider the example below:

f <- future({
  expression
})
v <- value(f)

We will use explicit futures for the remainder of this document.

The Status of the Execution

The state of a future can be either resolved or unresolved – either finished with its calculation or still running. Calling value() will automatically block the main R process until the future is resolved and then return the value. If you want to check whether a future is resolved without blocking, you can use the resolved function, shown below:

f <- future({
  expression
}) # create the future

r <- resolved(f) # check if it is done

v <- value(f) # wait and get the result

Let’s Look at Some Examples

Let’s start by defining some simple functions:

library(future)

increment <- function(x) {
  Sys.sleep(runif(1))
  return(x + 1)
}

double <- function(x) {
  Sys.sleep(runif(1))
  return(2 * x)
}

add <- function(x, y) {
  Sys.sleep(runif(1))
  return(x + y)
}

Let’s try out the sequential environment. We define the seed parameter in the future function due to the use of random numbers in the functions:

plan(sequential)

future <- future(increment(1), seed = NULL)

resolved(future)
message(value(future))

As you can see, resolved() returned TRUE, showing that the future was evaluated completely before the R process moved on to the resolved function. This is exactly how we would expect the function to work without the futures package.

If we want to have non-blocking code, we need to use one of the other environments. Here we use multisession:

plan(multisession)

future <- future(increment(1), seed = TRUE)

resolved(future)
message(value(future))

This time, the resolved function returned FALSE. This shows that, instead of waiting until the future completed, the R process continued on before finally waiting for the function to complete at the value function.

Note that, although futures can be nested, they can’t be chained. Before using a future as an input to another future, you need to evaluate and get its values, as shown below:

x <- future(increment(1), seed = TRUE)
y <- future(double(2), seed = TRUE)

# evaluate and get the values of the futures
x <- value(x)
y <- value(y)

# pass them to the next future
z <- future(add(x, y), seed = TRUE)

resolved(z)
message(value(z))

Conclusion

The future package is an excellent tool for parallelizing code in R. It allows for the re-use of code on a variety of backends, including on clusters. It also serves as the backend for other parallel packages like furrr or targets.

If you want to learn more about the future package, be sure to read the vignettes published for the package.

library(future)

increment <- function(x) {
  Sys.sleep(runif(1))
  return(x + 1)
}

double <- function(x) {
  Sys.sleep(runif(1))
  return(2 * x)
}

add <- function(x, y) {
  Sys.sleep(runif(1))
  return(x + y)
}


plan(sequential)

future <- future(increment(1), seed = NULL)

resolved(future)
message(value(future))


plan(multisession)

future <- future(increment(1), seed = TRUE)

resolved(future)
message(value(future))


x <- future(increment(1), seed = TRUE)
y <- future(double(2), seed = TRUE)

# evaluate and get the values of the futures
x <- value(x)
y <- value(y)

# pass them to the next future
z <- future(add(x, y), seed = TRUE)

resolved(z)
message(value(z))