How to Get Google Spreadsheet CSV into a Pandas Dataframe

As a data scientist or software engineer, you may often find yourself working with data stored in Google Sheets. While Google Sheets provides a convenient way to store and share data, it can be challenging to work with this data programmatically. In this blog post, we will go through the steps to get a Google Spreadsheet CSV file into a Pandas dataframe.

As a data scientist or software engineer, you may often find yourself working with data stored in Google Sheets. While Google Sheets provides a convenient way to store and share data, it can be challenging to work with this data programmatically. In this blog post, we will go through the steps to get a Google Spreadsheet CSV file into a Pandas dataframe.

Table of Contents

  1. Introduction
  2. Why Use Pandas?
  3. Prerequisites
  4. Conclusion

Why Use Pandas?

Pandas is a Python library that provides powerful data manipulation and analysis tools. It allows you to read data from various sources, including CSV files, Excel files, SQL databases, and more. Pandas dataframes provide an efficient way to manipulate and analyze data, making it a popular choice among data scientists and software engineers.

Prerequisites

To get started with this tutorial, you will need the following:

  • A Google account
  • A Google Spreadsheet with data in CSV format
  • Python 3.x installed on your computer
  • The Pandas library installed in Python

Step 1: Set Up Google API Credentials

To access data from a Google Spreadsheet, you need to create a project in the Google Cloud Console and enable the Google Sheets API. Follow the steps below to set up your API credentials:

  1. Go to the Google Cloud Console.
  2. Create a new project by clicking on the “Select a project” dropdown in the top bar and clicking on “New Project.”

Google Cloud New Project

  1. Enter a name for your project and click “Create.”

Google Cloud Project Filling

  1. Once your project is created, click on the “APIs and Services” button.

Google Cloud Project Menu

  1. Search for “Google Sheets API” and click on the result.

Google Sheets API

  1. Click on the “Enable” button to enable the API.

Enable Google Sheets API

  1. Click on the “Create Credentials” button.

Google Cloud Create Credentials

  1. Select “Aplication data” as the type of data credentials you want to create and click .

Google Cloud Credential Type

  1. Enter a name for your service account and click “Create and Continue”.

Service Account Google Account

  1. Select a role and click on the “Continue” button and the “Done”.

Google Cloud Select Role

  1. In the “Service Accounts” page, click on the newly created service account.

Google Cloud Service Account

  1. Go to the “Keys” tab, click “Add Key” and choose “Create new key”, choose “JSON” as the key type and click “Create”, the JSON key file will be automatically downloaded to your computer.

Google Cloud Keys

Once you have completed these steps, you will have a JSON file containing your API credentials. Keep this file safe as you will need it later.

Step 2: Share your Google Sheet with the Service Account and obtain the spreadsheetId

Open the Google Sheet you want to access programmatically. Share it with the email address of your service account (found on the service account details page in the Google Cloud Console).

To obtain the spreadsheetId of a Google Sheet, open the desired spreadsheet in your browser and look at the URL. The spreadsheetId is the portion of the URL between “/d/” and “/edit”. For example, in the URL “https://docs.google.com/spreadsheets/d/1aBcD2eF3gHiJKl4MnOP5Q/edit#gid=0", the spreadsheetId is “1aBcD2eF3gHiJKl4MnOP5Q”. This unique identifier is what you will use to access the spreadsheet through the Google Sheets API.

Step 3: Install Required Libraries

To get started, you need to install the necessary libraries. You can do this using pip, the Python package manager. Open a terminal or command prompt and run the following command:

pip install pandas google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client

This command installs the Pandas library and the required Google API libraries.

Step4: Access Google Sheets API using Python

To access the Google Sheets API using Python, you need to authenticate yourself using the JSON file containing your API credentials. Here’s how you can do this:

import pandas as pd
from google.oauth2 import service_account
from googleapiclient.discovery import build

# Replace the placeholders with your API credentials
creds = service_account.Credentials.from_service_account_file(
    'path/to/your/credentials.json',
    scopes=['https://www.googleapis.com/auth/spreadsheets.readonly']
)

# Replace the placeholders with your spreadsheet ID and range
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId='your-spreadsheet-id', range='Sheet1!A1:D10').execute()

df = pd.DataFrame(result.get('values', []))
print(df.head())

Output example:

             0     1               2
0         Name   Age            City
1     John Doe    25        New York
2   Jane Smith    30   San Francisco
3  Bob Johnson    28     Los Angeles
4  Emily Davis    22         Chicago

In this code, we import the necessary libraries and authenticate ourselves using the service_account module from the google.oauth2 library. We then use the build function from the googleapiclient.discovery library to create a service object that we can use to access the Google Sheets API. Finally, we use the sheet.values().get() method to get the values from the specified range in the spreadsheet and convert it to a Pandas dataframe.

Conclusion

In this blog post, we have gone through the steps to get a Google Spreadsheet CSV file into a Pandas dataframe. We started by setting up our API credentials and installing the required libraries. We then used the Google Sheets API and the Pandas library to read data from the spreadsheet and convert it to a dataframe.

Pandas dataframes provide a powerful way to manipulate and analyze data, making it a popular choice among data scientists and software engineers. By following the steps outlined in this tutorial, you can easily get data from a Google Spreadsheet into a Pandas dataframe and start analyzing it using Python.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.