Python AWS Boto3: How to Read Files from S3 Bucket

In the world of data science, managing and accessing data is a critical task. AWS S3, a scalable and secure object storage service, is often the go-to solution for storing and retrieving any amount of data, at any time, from anywhere. In this blog post, we’ll explore how to read files from an S3 bucket using Boto3, the Amazon Web Services (AWS) SDK for Python.

In this blog post, we’ll explore how to read files from an S3 bucket using Boto3, the Amazon Web Services (AWS) SDK for Python.

Table of Contents:

  1. Prerequisites
  2. Setting Up Your S3 Bucket
  3. Installing Boto3
  4. Reading Files from S3 Bucket with Boto3
  5. Common Errors
  6. Conclusion

Prerequisites

Before we dive in, make sure you have the following:

  • An AWS account
  • AWS CLI installed and configured
  • Python and Boto3 installed

Setting Up Your S3 Bucket

First, you’ll need to create an S3 bucket. You can do this through the AWS Management Console, AWS CLI, or Boto3. For the sake of this tutorial, we’ll use the AWS Management Console.

  1. Navigate to the S3 service in the AWS Management Console.
  2. Click on “Create bucket”.

1

  1. Enter a unique name for your bucket, we will name it saturn12 and select a region.
  2. Leave the remaining settings as default and click “Create”.
  3. Upload the file you want to read to the bucket using the “Add File” button

2

We uploaded the data - data.csv file which contains the following data:

Name,Age,City,Occupation
Alice ,25,New York,Data Scientist 
Bob ,30,London ,Software Engineer
Charlie ,35,Paris ,Data Analyst 

Installing Boto3

If you haven’t already, install Boto3 using pip:

pip install boto3

Configuring AWS Credentials

Boto3 needs your AWS credentials to interact with AWS services. You can configure them in several ways, but the simplest is to use the AWS CLI:

aws configure

Enter your AWS Access Key ID, Secret Access Key, default region name, and default output format when prompted.

Reading Files from S3 Bucket with Boto3

Now that we’re set up, let’s dive into how to read files from an S3 bucket using Boto3.

First, import Boto3 and create an S3 client:

import boto3

s3 = boto3.client('s3')

To read a file, we’ll use the get_object method, which retrieves objects from Amazon S3:

def read_file_from_s3(bucket_name, file_name):
    obj = s3.get_object(Bucket=bucket_name, Key=file_name)
    data = obj['Body'].read()
    return data

In this function, bucket_name is the name of your S3 bucket, and file_name is the name of the file you want to read. The get_object method returns a dictionary that contains the file data in the ‘Body’ key. We then read this data using the read method.

Below is how we successfully read the data from the file in the s3 bucket:

3

Common Errors:

  • Incorrect Credentials: Double-check your AWS Access Key ID and Secret Access Key are valid and entered correctly.

  • Typo in Bucket or File Name: Ensure the bucket_name and file_name arguments match your S3 bucket and file names precisely.

  • Missing Permission: Verify your IAM user has permissions to access the target bucket and file.

Conclusion

Reading files from an AWS S3 bucket using Python and Boto3 is straightforward. With just a few lines of code, you can retrieve and work with data stored in S3, making it an invaluable tool for data scientists working with large datasets.

Remember, AWS provides a vast array of services that can be leveraged for data science tasks. Boto3 is your gateway to automating and interacting with these services using Python. Happy coding!


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.