How to Retrieve Only Folders from Amazon S3 Using Python Boto3

How to Retrieve Only Folders from Amazon S3 Using Python Boto3
Hello fellow data scientists and software engineers! Today, we will be tackling a crucial question that many of you have asked: How can I get the list of only folders in Amazon S3 using Python Boto3?
The answer to this question is not as straightforward as it might seem. Amazon S3, Amazon’s Simple Storage Service, doesn’t truly have a concept of “folders”. Instead, the idea of folders is emulated through the use of the “/” delimiter in key names. However, we can still retrieve a list of these ‘pseudo-folders’ using Python’s Boto3 library. Let’s get into it!
Setting up Boto3
First, you will need to install and configure Boto3. Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of AWS services like Amazon S3 and Amazon EC2. To install boto3, use pip:
pip install boto3
After installing boto3, you must configure it as follows:
import boto3
s3 = boto3.resource('s3', aws_access_key_id='YOUR_ACCESS_KEY', aws_secret_access_key='YOUR_SECRET_KEY')
Listing S3 ‘Folders’
Remember, S3 doesn’t have real folders, but we can emulate this behavior with prefixes and delimiters. Here’s the function that does this:
def get_folders(bucket, prefix=''):
client = boto3.client('s3', aws_access_key_id='YOUR_ACCESS_KEY', aws_secret_access_key='YOUR_SECRET_KEY')
result = client.list_objects(Bucket=bucket, Prefix=prefix, Delimiter='/')
for o in result.get('CommonPrefixes'):
yield o.get('Prefix')
In this function, we’re creating a client connection to S3 using boto3. We then call the list_objects method of the client instance, passing the Bucket, Prefix, and Delimiter as arguments. The Prefix argument specifies a key name prefix, and all keys that contain the same string after the prefix are grouped under a single result element. The Delimiter argument is a character that groups keys.
The list_objects method returns a dictionary that contains various metadata about the objects in the bucket, and the objects themselves in the ‘Contents’ list. We’re interested in ‘CommonPrefixes’, which lists keys grouped by delimiter. We then yield each ‘Prefix’, which effectively gives us our ‘folders’.
You can call this function as follows:
for folder in get_folders('my_bucket', 'my_folder/'):
print(folder)
This will print all ‘sub-folders’ in ‘my_folder’.
Conclusion
And there you have it, a simple guide on how to retrieve only folders in an Amazon S3 bucket using Python’s Boto3 library! Remember to replace 'YOUR_ACCESS_KEY'
and 'YOUR_SECRET_KEY'
with your actual AWS access key and secret key.
I hope you’ve found this tutorial helpful. If you have any difficulties or questions, don’t hesitate to leave a comment below. Happy coding!
Keywords: Python, Amazon S3, Boto3, AWS, Folders, List Objects, Data Science, Software Engineering
Tags: #Python #AmazonS3 #Boto3 #AWS #DataScience #SoftwareEngineering
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.