How to Upload Large Files from Web Browser to Amazon S3

How to Upload Large Files from Web Browser to Amazon S3
As data scientists or software engineers, we often face situations where we need to upload large files from a web browser to Amazon S3. This process may seem daunting due to the size limitations on HTTP upload. However, with the right approach, it’s possible to tackle this challenge efficiently. In this article, we will explore how to upload large files from a web browser to Amazon S3.
What is Amazon S3?
Amazon S3 (Simple Storage Service) is a scalable, high-speed, web-based cloud storage service designed for online backup and archiving of data and applications on Amazon Web Services (AWS). S3 is ideal for storing large files due to its durability, scalability, and easy-to-use management features.
Why Large File Uploads Can Be a Problem
Most web browsers limit the size of files that can be uploaded via HTTP. So if you’re dealing with large files that exceed these limits, you’ll run into issues. To overcome these, we can use a process called ‘multipart upload’, which breaks the file into smaller parts and uploads them separately.
How to Upload Large Files to S3: Multipart Upload
While uploading, the large file is divided into smaller parts, each being uploaded independently. Once all parts are uploaded, Amazon S3 combines these parts as a single file.
Below is a step-by-step guide on how to implement this process:
Step 1: Initiate Multipart Upload
First, we have to initiate the multipart upload and get an upload id. This id is unique to the file and is used to complete the upload process.
var params = {
Bucket: 'your-bucket-name',
Key: 'your-file-name',
};
s3.createMultipartUpload(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data.UploadId);
});
Step 2: Upload Each Part
Next, we upload each part of the file. Every part (except the last one) must be at least 5MB in size. Each uploaded part returns an ETag, which is a unique identifier.
var params = {
Bucket: 'your-bucket-name',
Key: 'your-file-name',
PartNumber: 1, // increment for each part
UploadId: 'your-upload-id',
Body: 'part-data', // data of the part
};
s3.uploadPart(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data.ETag);
});
Step 3: Complete Multipart Upload
Finally, after all parts are uploaded, we complete the multipart upload process. We need to provide the upload id and an array of the uploaded parts, including their part numbers and ETags.
var params = {
Bucket: 'your-bucket-name',
Key: 'your-file-name',
MultipartUpload: {
Parts: [
{
ETag: 'etag-from-part-1',
PartNumber: 1,
},
// ...additional parts
],
},
UploadId: 'your-upload-id',
};
s3.completeMultipartUpload(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data.Location);
});
Conclusion
Uploading large files from a web browser to Amazon S3 can seem challenging due to size limitations. However, by using the multipart upload process, we can efficiently handle this task. The process involves initiating the multipart upload, uploading each part, and then completing the process. By following these steps, we can ensure efficient and seamless upload of large files to Amazon S3.
Remember, while dealing with large data files, always prioritize security and privacy. Make sure your data is encrypted and that you have the proper permissions set up in AWS. Happy uploading!
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.