How to Copy Files from AWS S3 to Your Local Machine and vice versa using aws s3 sync

Data scientists often need to work with large datasets, and Amazon Web Services (AWS) Simple Storage Service (S3) is a popular choice for storing and retrieving these datasets. However, you might find yourself needing to copy files from S3 to your local machine for various reasons. This blog post will guide you through the process, step by step.

Prerequisites

Before we begin, ensure that you have the following:

  1. An AWS account with access to S3.
  2. AWS Command Line Interface (CLI) installed on your local machine.
  3. Configured AWS CLI with your credentials.

Step 1: Install AWS CLI

If you haven’t installed AWS CLI on your local machine, you can do so by following the instructions on the official AWS CLI User Guide.

Step 2: Configure AWS CLI

Once you’ve installed AWS CLI, you need to configure it with your AWS credentials. You can do this by running the following command:

aws configure

You’ll be prompted to enter your AWS Access Key ID, Secret Access Key, default region name, and default output format like below:

$ aws configure
AWS Access Key ID [None]: accesskey
AWS Secret Access Key [None]: secretkey
Default region name [None]: us-west-2
Default output format [None]:
  • For Access Key and Secret Access Key, you can find them by navigating to the Security Credentials in the Account name dropdown menu.
  • For the Default Region, you can find it by navigating to the Setting icon and choosing More User Settings and then you can specify the Default Region
  • For the Default Output Format, you can choose it whether you want text or json etc…

If you’re unsure about these, you can find more information in the AWS documentation.

Now we have two methods to copy files, either using the cp command or the aws s3 sync method which is more powerful and flexible when working with S3 Buckets.

Method 1: CP command

  • List Your S3 Buckets

Before copying files, you need to know which S3 buckets are available. You can list all your S3 buckets using the following command:

aws s3 ls
  • Copy Files from S3 to Your Local Machine

Now that you’ve listed your S3 buckets, you can copy files from any of these buckets to your local machine. The command to do this is:

aws s3 cp s3://your-bucket-name/your-file-name /path/to/local/directory

Replace your-bucket-name and your-file-name with the name of your S3 bucket and the file you want to copy, respectively. Replace /path/to/local/directory with the path to the directory on your local machine where you want to copy the file.

  • Verify the Copy

After the copy operation, it’s always a good practice to verify that the file has been copied correctly. You can do this by checking the contents of the local directory where you copied the file.

ls /path/to/local/directory

Method 2: AWS S3 Sync

The AWS CLI provides the “aws s3 sync” command, making it simple to transfer files between your local machine and S3 in both directions, or directly between different buckets. It comes with various flags and options to fulfill all your synchronization needs.

This is the general syntax - without any further flags and options that will we explore later on:

aws s3 sync <source> <destination>

Downloading files from a bucket to Local:

We can download all files from a specific folder to our local machine. If you’re adding the –recursive flag, the sync command will also download nested folders and their files.

aws s3 sync s3://mybucket ~/Downloads --recursive

The S3 sync command will skip empty folders in both upload and download. This means that there won’t be a folder creation at the destination if the source folder does not include any files.

Uploading files to a bucket

This also works in the other direction by switching our parameters.

aws s3 sync ~/Downloads s3://mybucket

By default, AWS S3 sync will upload all files from our directory to the target directory. Already existing files will be overwritten, or if versioning is enabled, will be saved as a new version.

Syncing files between buckets

You can also copy files between two buckets.

aws s3 sync s3://source-bucket s3://target-bucket

This removes the intermediate step of explicitly downloading the files to your local machine from the source bucket and only then uploading them afterward to your target bucket.

Conclusion

Copying files from AWS S3 to your local machine is a straightforward process once you’ve installed and configured AWS CLI. This guide has walked you through the process through two different methods. The simple cp command and the more sophisticated method of aws s3 sync.

Remember, working with AWS S3 and other cloud storage services can greatly enhance your data science workflows. However, always ensure that you’re following best practices for data security and management.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.