How to Get Google Spreadsheet CSV into a Pandas Dataframe
As a data scientist or software engineer, you may often find yourself working with data stored in Google Sheets. While Google Sheets provides a convenient way to store and share data, it can be challenging to work with this data programmatically. In this blog post, we will go through the steps to get a Google Spreadsheet CSV file into a Pandas dataframe.
Table of Contents
Why Use Pandas?
Pandas is a Python library that provides powerful data manipulation and analysis tools. It allows you to read data from various sources, including CSV files, Excel files, SQL databases, and more. Pandas dataframes provide an efficient way to manipulate and analyze data, making it a popular choice among data scientists and software engineers.
Prerequisites
To get started with this tutorial, you will need the following:
- A Google account
- A Google Spreadsheet with data in CSV format
- Python 3.x installed on your computer
- The Pandas library installed in Python
Step 1: Set Up Google API Credentials
To access data from a Google Spreadsheet, you need to create a project in the Google Cloud Console and enable the Google Sheets API. Follow the steps below to set up your API credentials:
- Go to the Google Cloud Console.
- Create a new project by clicking on the “Select a project” dropdown in the top bar and clicking on “New Project.”
- Enter a name for your project and click “Create.”
- Once your project is created, click on the “APIs and Services” button.
- Search for “Google Sheets API” and click on the result.
- Click on the “Enable” button to enable the API.
- Click on the “Create Credentials” button.
- Select “Aplication data” as the type of data credentials you want to create and click .
- Enter a name for your service account and click “Create and Continue”.
- Select a role and click on the “Continue” button and the “Done”.
- In the “Service Accounts” page, click on the newly created service account.
- Go to the “Keys” tab, click “Add Key” and choose “Create new key”, choose “JSON” as the key type and click “Create”, the JSON key file will be automatically downloaded to your computer.
Once you have completed these steps, you will have a JSON file containing your API credentials. Keep this file safe as you will need it later.
Step 2: Share your Google Sheet with the Service Account and obtain the spreadsheetId
Open the Google Sheet you want to access programmatically. Share it with the email address of your service account (found on the service account details page in the Google Cloud Console).
To obtain the spreadsheetId of a Google Sheet, open the desired spreadsheet in your browser and look at the URL. The spreadsheetId is the portion of the URL between “/d/” and “/edit”. For example, in the URL “https://docs.google.com/spreadsheets/d/1aBcD2eF3gHiJKl4MnOP5Q/edit#gid=0", the spreadsheetId is “1aBcD2eF3gHiJKl4MnOP5Q”. This unique identifier is what you will use to access the spreadsheet through the Google Sheets API.
Step 3: Install Required Libraries
To get started, you need to install the necessary libraries. You can do this using pip, the Python package manager. Open a terminal or command prompt and run the following command:
pip install pandas google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client
This command installs the Pandas library and the required Google API libraries.
Step4: Access Google Sheets API using Python
To access the Google Sheets API using Python, you need to authenticate yourself using the JSON file containing your API credentials. Here’s how you can do this:
import pandas as pd
from google.oauth2 import service_account
from googleapiclient.discovery import build
# Replace the placeholders with your API credentials
creds = service_account.Credentials.from_service_account_file(
'path/to/your/credentials.json',
scopes=['https://www.googleapis.com/auth/spreadsheets.readonly']
)
# Replace the placeholders with your spreadsheet ID and range
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId='your-spreadsheet-id', range='Sheet1!A1:D10').execute()
df = pd.DataFrame(result.get('values', []))
print(df.head())
Output example:
0 1 2
0 Name Age City
1 John Doe 25 New York
2 Jane Smith 30 San Francisco
3 Bob Johnson 28 Los Angeles
4 Emily Davis 22 Chicago
In this code, we import the necessary libraries and authenticate ourselves using the service_account
module from the google.oauth2
library. We then use the build
function from the googleapiclient.discovery
library to create a service object that we can use to access the Google Sheets API. Finally, we use the sheet.values().get()
method to get the values from the specified range in the spreadsheet and convert it to a Pandas dataframe.
Conclusion
In this blog post, we have gone through the steps to get a Google Spreadsheet CSV file into a Pandas dataframe. We started by setting up our API credentials and installing the required libraries. We then used the Google Sheets API and the Pandas library to read data from the spreadsheet and convert it to a dataframe.
Pandas dataframes provide a powerful way to manipulate and analyze data, making it a popular choice among data scientists and software engineers. By following the steps outlined in this tutorial, you can easily get data from a Google Spreadsheet into a Pandas dataframe and start analyzing it using Python.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.