How to Zip Files in Amazon S3 Bucket and Obtain its URL

Amazon Simple Storage Service (S3) provides a secure, durable, and highly-scalable cloud storage for data scientists and software engineers. Often, there’s a need to zip files in an S3 bucket and retrieve the URL for sharing or storage optimization purposes. In this post, we’ll walk you through the steps required to accomplish this.

How to Zip Files in Amazon S3 Bucket and Obtain its URL

Amazon Simple Storage Service (S3) provides a secure, durable, and highly-scalable cloud storage for data scientists and software engineers. Often, there’s a need to zip files in an S3 bucket and retrieve the URL for sharing or storage optimization purposes. In this post, we’ll walk you through the steps required to accomplish this.

For this tutorial, we’ll use Python and Boto3, the Amazon Web Services (AWS) SDK for Python. It allows Python developers to write software that makes use of AWS services like Amazon S3 and others.

Step 1: Setting Up

Firstly, install boto3 if it isn’t already:

pip install boto3

Ensure your AWS credentials are set up. You can configure it using the AWS CLI:

aws configure

Input your AWS Access Key ID, AWS Secret Access Key, Default region name, and Default output format when prompted.

Step 2: Importing Necessary Libraries

import os
import zipfile
import boto3
from botocore.exceptions import NoCredentialsError

Step 3: Accessing Your S3 Bucket

To access your S3 bucket, create a session using your credentials:

s3 = boto3.client('s3')

Step 4: Zipping Files

Define a function to zip the files:

def zip_files(files, zip_name):
    with zipfile.ZipFile(zip_name, 'w') as zipf:
        for file in files:
            zipf.write(file)

Step 5: Downloading Files from S3 Bucket

Before zipping, we need to download the files from the S3 bucket to a local directory:

def download_files(bucket_name, files, local_path):
    try:
        for file in files:
            local_file = os.path.join(local_path, file)
            s3.download_file(bucket_name, file, local_file)
        print("Download Successful")
        return True
    except NoCredentialsError:
        print("Credentials not available")
        return False

Step 6: Uploading the Zipped File

After creating the zip file, upload it back to the S3 bucket:

def upload_file_to_s3(bucket_name, s3_file_name, local_file_path):
    try:
        s3.upload_file(local_file_path, bucket_name, s3_file_name)
        print("Upload Successful")
        return True
    except FileNotFoundError:
        print("The file was not found")
        return False
    except NoCredentialsError:
        print("Credentials not available")
        return False

Step 7: Retrieving the URL

Finally, to retrieve the URL of the uploaded zip file:

def get_url(bucket_name, s3_file_name):
    url = f"https://{bucket_name}.s3.amazonaws.com/{s3_file_name}"
    return url

In conclusion, zipping files in an Amazon S3 bucket and retrieving its URL can be efficiently done using Python and Boto3. This guide has walked you through each step of the process, from setting up your environment to retrieving the URL for the zipped file.

Remember to replace all instances of ‘bucket_name’, ‘s3_file_name’, ‘local_file_path’, and ‘files’ with your actual bucket name, desired file name, the local path to download the files to, and the list of files respectively.

By zipping files, you can save storage space and make file transfers much quicker and more efficient. This is particularly useful in data science and software engineering where handling large amounts of data is the norm. Happy coding!



About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.