Using the targets Package
To illustrate this package, we use the same data, model, and functions that were used in the furrr example. See that example for a thorough explanation. All these functions are contained in the “functions.R” file in this repository.
We will be using the future backend for parallel processing. Learn more about how the future package works here.
The only library we need to import now is the targets library itself. All other libraries are imported during the workflow process.
Use a “_targets.R” file
First, we need a file named “_targets.R”. This file will contain all the information for targets to create a workflow.
The “_targets.R” file does the following:
- Imports the appropriate libraries
- Imports the functions from “functions.R”
- Sets global options for targets, including the packages that each target node needs
- Creates an execution environment for the future package
- Creates a list of target nodes
Target nodes are defined in a list using the function
tar_target(). Each target node is a single step in a workflow. A node runs an R command and returns a value.
tar_target has several inputs, but the important ones here are:
- name: The name of the target node – Targets are referenced by name, so downstream nodes can reference upstream nodes by name.
- command: The R function to run
- format: A storage format for the return value – This can have considerable positive effects to run time if you are moving large files around.
- deployment: This dictates where the function is to be run. It can either be “main” or “worker.” Many functions in this graph will not be improved by parallelization, so they are run on the “main” process.
Below is an example target node. Take a look at the “_targets.R” file for more information.
tar_target( preprocessed_data, preprocess_data(data), format = "qs", deployment = "main" )
Show the Target Graph
Once the target workflow is defined by the “_targets.R” file, we can take a look at the resulting directed acyclic graph (DAG). Running
tar_visnetwork() will output a DAG that shows the relationship between target nodes. This can be very useful when you are first setting up a workflow.
Below is an example of the DAG for a workflow. As you can see, each node is connected and has a status.
Watch the Progress
If you want to see a live view of the graph and various statuses, you can run the
tar_watch() command. This command opens a shiny app that displays a variety of information. It is updated, by default, every 10 seconds.
You can see an example for a pipeline below:
Run the Targets Workflow
Finally, let’s actually run the workflow. In this case, we are using
tar_make_future() because we want to use the future parallel backend. Because we defined our plan as multisession in the “_targets.R” file, the code we marked to send to workers will be sent to appropriate future processes.
tar_make_future(workers = 8)
That’s it! This particular workflow takes approximately five minutes to compute all the hyperparameter options. You can find the complete results in the “report.html” document.
If, for instance, you stop the computation in the middle to change a parameter, targets will remember what has already been computed and skip running those steps in the next run. Pretty neat!
The targets package is fantastic for writing pipeline code. It has the added benefit of running on parallel backends like future.
Thanks to Deep Learning With Keras To Predict Customer Churn and the Targets R Package Keras Model Example for the inspiration for this article.