Load Data From S3 Buckets

Load data stored in AWS S3 buckets into Saturn Cloud
Load Data From S3 Buckets
Try this example in seconds on Saturn Cloud

Overview

If you use AWS S3 to store your data, connecting to Saturn Cloud takes just a couple of steps.

In this example we use aws.s3 to connect to data, but you can also use libraries like paws or botor if you prefer.

Before starting this, you should create a RStudio server resource. See our quickstart if you don’t know how to do this yet.

Process

To connect to public S3 buckets, you can simply connect using anonymous connections in RStudio, the way you might with your local laptop. In this case, you can skip to the Connect to Data Via aws.s3 section.

If your S3 data storage is not public and requires AWS credentials, please read on!

Create AWS Credentials

Credentials for S3 access can be acquired inside your AWS account. Visit https://aws.amazon.com/ and sign in to your account.

In the top right corner, click the dropdown under your username and select My Security Credentials.

Screenshot of AWS site My Security Credentials page

Under “My Security Credentials” you’ll see section titled “Access keys for CLI, SDK, & API access”. If you don’t yet have an access key listed, create one.

Screenshot of Access Keys section of AWS site My Security Credentials page

Save the key information that this generates, and keep it in a safe place!

Add AWS Credentials to Saturn Cloud

Sign in to your Saturn Cloud account and select Credentials from the menu on the left.

Saturn Cloud left menu with arrow pointing to Credentials tab

This is where you will add your S3 API key information. This is a secure storage location, and it will not be available to the public or other users without your consent.

At the top right corner of this page, you will find the New button. Click here, and you will be taken to the Credentials Creation form.

Screenshot of Saturn Cloud Create Credentials form

You will be adding three credentials items: your AWS Access Key, AWS Secret Access Key, and you default region. Complete the form one time for each item.

CredentialTypeNameVariable Name
AWS Access Key IDEnvironment Variableaws-access-key-idAWS_ACCESS_KEY_ID
AWS Secret Access KeyEnvironment Variableaws-secret-access-keyAWS_SECRET_ACCESS_KEY
AWS Default RegionEnvironment Variableaws-default-regionAWS_DEFAULT_REGION

Copy the values from your AWS console into the Value section of the credential creation form. The credential names are recommendations; feel free to change them as needed for your workflow. You must, however, use the provided Variable Names for S3 to connect correctly.

With this complete, your S3 credentials will be accessible by Saturn Cloud resources! You will need to restart any RStudio Server for the credentials to populate to those resources.

Setting Up Your Resource

aws.s3 is not installed by default in Saturn images, so you will need to install it onto your resource. This is already done in this example recipe, but if you are using a custom resource you will need to install.packages("aws.s3"). Check out our page on installing packages to see the various methods for achieving this!

Connect to Data Via aws.s3

Set Up the Connection

Normally, aws.s3 will automatically seek your AWS credentials from the environment. Since you have followed our instructions above for adding and saving credentials, this will work for you! The below command simply lists your s3 buckets.

library(aws.s3)

bucketlist()

Now you can save files directly to your current working directory using the save_object command.

save_object("s3://saturn-public-data/hello_world.txt")

If you prefer to read the data directly into a variable (in this instance, a data.table), you can save it as a temp file and read it from there.

library(dplyr)
library(data.table)

data <-
  save_object("s3://saturn-public-data/pet-names/seattle_pet_licenses.csv",
    file = tempfile(fileext = ".csv")
  ) %>%
  fread()