How to Get the Size of an Amazon S3 Bucket Using Python and Boto3 Library

As data scientists or software engineers, you have probably interacted with Amazon S3, an object storage service offered by Amazon Web Services (AWS) that offers industry-leading scalability, data availability, security, and performance. It’s a crucial tool for anyone working with large amounts of data. But, how can we determine the size of an S3 bucket?

How to Get the Size of an Amazon S3 Bucket Using Python and Boto3 Library

As data scientists or software engineers, you have probably interacted with Amazon S3, an object storage service offered by Amazon Web Services (AWS) that offers industry-leading scalability, data availability, security, and performance. It’s a crucial tool for anyone working with large amounts of data. But, how can we determine the size of an S3 bucket?

This post will guide you on how to determine the size of an Amazon S3 bucket using Python and the Boto3 library. Let’s dive in!

Introduction to Boto3

Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3, Amazon EC2, and others. It’s a must-have tool for anyone working with AWS in a Python environment.

Prerequisites

Before we begin, make sure you have the following:

  1. An AWS account.
  2. Python installed on your system.
  3. Boto3 installed in your Python environment. You can install it via pip:
pip install boto3
  1. AWS credentials configured. You can configure it by running aws configure in your terminal and following the prompts.

Getting the Size of an S3 Bucket

To get the size of an S3 bucket, you would need to iterate over all the objects in the bucket and sum up their sizes. Here’s a Python function that does just that:

import boto3

def get_bucket_size(bucket_name):
    s3 = boto3.client('s3')
    try:
        response = s3.list_objects_v2(Bucket=bucket_name)
        if 'Contents' in response:
            return sum([obj['Size'] for obj in response['Contents']])
        else:
            return 0
    except Exception as e:
        print(e)

# Usage
bucket_size = get_bucket_size('my-bucket-name')
print(f'Bucket size is: {bucket_size} bytes')

The list_objects_v2 method returns some metadata about each object in the bucket, including its size (Size) in bytes. We use a list comprehension to create a list of these sizes, and then pass that list to the sum function to add them up.

Note that the list_objects_v2 method only returns up to 1000 objects at a time. If your bucket contains more than 1000 objects, you’ll need to make additional calls to list_objects_v2, passing the value of the NextContinuationToken from the previous response to the ContinuationToken parameter of the next request.

def get_bucket_size(bucket_name):
    s3 = boto3.client('s3')
    size = 0
    continuation_token = None

    while True:
        try:
            if continuation_token:
                response = s3.list_objects_v2(Bucket=bucket_name, ContinuationToken=continuation_token)
            else:
                response = s3.list_objects_v2(Bucket=bucket_name)

            if 'Contents' in response:
                size += sum([obj['Size'] for obj in response['Contents']])
                if 'NextContinuationToken' in response:
                    continuation_token = response['NextContinuationToken']
                else:
                    return size
            else:
                return 0
        except Exception as e:
            print(e)
            return 0

Conclusion

Boto3 is a powerful tool for interacting with AWS services in Python. In this post, you’ve learned how to use it to calculate the size of an Amazon S3 bucket. This can be useful for monitoring your storage usage or for sizing calculations in larger data processing pipelines.

As always, remember to handle your AWS credentials with care, and happy coding!


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.