How to Move All Objects in Amazon S3 from One Prefix to Another Using the AWS SDK for Node.js

When it comes to managing data in Amazon S3 (Simple Storage Service), tasks such as copying or moving objects are common operations. In this blog post, we’ll delve into how to accomplish this task using the AWS SDK for Node.js. This guide is primarily targeted at data scientists and software engineers who work with AWS S3 and Node.js.

How to Move All Objects in Amazon S3 from One Prefix to Another Using the AWS SDK for Node.js

When it comes to managing data in Amazon S3 (Simple Storage Service), tasks such as copying or moving objects are common operations. In this blog post, we’ll delve into how to accomplish this task using the AWS SDK for Node.js. This guide is primarily targeted at data scientists and software engineers who work with AWS S3 and Node.js.

What is AWS S3?

Amazon S3 is a scalable object storage service that allows users to store and retrieve any amount of data from anywhere on the web. It’s designed for 99.999999999% (11 9’s) of durability, and provides comprehensive security and compliance capabilities. The data in S3 is organized in buckets and objects. Objects reside within a bucket, and can be labeled with a prefix and a key. The prefix allows for logical grouping of the objects within a bucket.

Prerequisites

Before we get started, please ensure that you have the following:

  1. An AWS account.
  2. AWS CLI installed and configured with your access keys.
  3. Node.js and npm installed on your system.
  4. AWS SDK for Node.js installed in your project.

You can install the AWS SDK using npm:

npm install aws-sdk

Copying/Moving Objects with AWS SDK

The AWS SDK for Node.js provides methods for interacting with S3. In our case, we’ll primarily use listObjectsV2() to list the objects and copyObject() to copy them.

Here’s a step-by-step guide:

  1. Create an S3 client
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
  1. Define the parameters for listing the objects

The listObjectsV2 method requires the bucket name and the prefix of the objects you want to copy.

let listParams = {
    Bucket: 'sourceBucket',
    Prefix: 'sourcePrefix'
};
  1. List the objects
s3.listObjectsV2(listParams, function(err, data) {
    if (err) {
        console.log(err, err.stack); // an error occurred
    } else {
        console.log(data);           // successful response
    }
});
  1. Copy the objects

For each object under the source prefix, we’ll use the copyObject() method to copy it to the destination prefix.

data.Contents.forEach(function(file) {
    let params = {
        Bucket: 'destinationBucket',
        CopySource: `sourceBucket/${file.Key}`,
        Key: file.Key.replace('sourcePrefix', 'destinationPrefix')
    };

    s3.copyObject(params, function(copyErr, copyData){
        if (copyErr) {
            console.log(copyErr, copyErr.stack);
        } else {
            console.log(copyData);
        }
    });
});
  1. Delete the source objects (optional)

If you want to ‘move’ the files instead of ‘copy’, you can delete the source objects after copying them.

data.Contents.forEach(function(file) {
    let delParams = {
        Bucket: 'sourceBucket',
        Key: file.Key
    };

    s3.deleteObject(delParams, function(delErr, delData) {
        if (delErr) console.log(delErr, delErr.stack);
        else console.log('delete', delData);
    });
});

And there you have it! That’s how you move or copy objects from one prefix to another in Amazon S3 using the AWS SDK for Node.js.

Remember that when dealing with large numbers of files, you may need to handle pagination with the listObjectsV2 method. Also, always ensure that your AWS IAM roles and policies allow for these operations to prevent permission issues.

By leveraging the power of AWS SDK for Node.js, we can automate and simplify the management of our data in S3. Happy coding!


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.