How to Zip Files in Amazon S3 Bucket and Obtain its URL

How to Zip Files in Amazon S3 Bucket and Obtain its URL
Amazon Simple Storage Service (S3) provides a secure, durable, and highly-scalable cloud storage for data scientists and software engineers. Often, there’s a need to zip files in an S3 bucket and retrieve the URL for sharing or storage optimization purposes. In this post, we’ll walk you through the steps required to accomplish this.
For this tutorial, we’ll use Python and Boto3, the Amazon Web Services (AWS) SDK for Python. It allows Python developers to write software that makes use of AWS services like Amazon S3 and others.
Step 1: Setting Up
Firstly, install boto3 if it isn’t already:
pip install boto3
Ensure your AWS credentials are set up. You can configure it using the AWS CLI:
aws configure
Input your AWS Access Key ID, AWS Secret Access Key, Default region name, and Default output format when prompted.
Step 2: Importing Necessary Libraries
import os
import zipfile
import boto3
from botocore.exceptions import NoCredentialsError
Step 3: Accessing Your S3 Bucket
To access your S3 bucket, create a session using your credentials:
s3 = boto3.client('s3')
Step 4: Zipping Files
Define a function to zip the files:
def zip_files(files, zip_name):
with zipfile.ZipFile(zip_name, 'w') as zipf:
for file in files:
zipf.write(file)
Step 5: Downloading Files from S3 Bucket
Before zipping, we need to download the files from the S3 bucket to a local directory:
def download_files(bucket_name, files, local_path):
try:
for file in files:
local_file = os.path.join(local_path, file)
s3.download_file(bucket_name, file, local_file)
print("Download Successful")
return True
except NoCredentialsError:
print("Credentials not available")
return False
Step 6: Uploading the Zipped File
After creating the zip file, upload it back to the S3 bucket:
def upload_file_to_s3(bucket_name, s3_file_name, local_file_path):
try:
s3.upload_file(local_file_path, bucket_name, s3_file_name)
print("Upload Successful")
return True
except FileNotFoundError:
print("The file was not found")
return False
except NoCredentialsError:
print("Credentials not available")
return False
Step 7: Retrieving the URL
Finally, to retrieve the URL of the uploaded zip file:
def get_url(bucket_name, s3_file_name):
url = f"https://{bucket_name}.s3.amazonaws.com/{s3_file_name}"
return url
In conclusion, zipping files in an Amazon S3 bucket and retrieving its URL can be efficiently done using Python and Boto3. This guide has walked you through each step of the process, from setting up your environment to retrieving the URL for the zipped file.
Remember to replace all instances of ‘bucket_name’, ‘s3_file_name’, ‘local_file_path’, and ‘files’ with your actual bucket name, desired file name, the local path to download the files to, and the list of files respectively.
By zipping files, you can save storage space and make file transfers much quicker and more efficient. This is particularly useful in data science and software engineering where handling large amounts of data is the norm. Happy coding!
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.