How to Read Data from Google Sheets using Colaboratory Google

As a software engineer, you know how important it is to have access to data for your projects. Google Sheets is a popular online spreadsheet platform that many people use to store and share data. In this blog post, we’ll show you how to read data from Google Sheets using Colaboratory, a free Jupyter notebook environment from Google.

As a software engineer, you know how important it is to have access to data for your projects. Google Sheets is a popular online spreadsheet platform that many people use to store and share data. In this blog post, we’ll show you how to read data from Google Sheets using Colaboratory, a free Jupyter notebook environment from Google.

Table of Contents

  1. Introduction
  2. Step 1: Open Colab
  3. Step 2: Mount Google Drive
  4. Step 3: Read Data from Google Sheet
  5. Step 4: Analyze the Data
  6. Alternative Approach: Using gspread with Colab
  7. Conclusion

Introduction

Colaboratory, or Colab for short, is an online platform provided by Google that allows you to write and run Python code in a web browser. One of the key benefits of Colab is that it integrates with Google Drive, allowing you to easily access and manipulate data stored in your Drive.

In this tutorial, we’ll show you how to read data from a Google Sheet stored in Google Drive using Colab. We’ll assume that you already have a Google account and that you have some data stored in a Google Sheet.

Step 1: Open Colab

To get started, open Colab in your web browser. You can do this by going to https://colab.research.google.com/ and signing in with your Google account.

Once you’re signed in, you’ll see the Colab home page. From here, you can create a new notebook or open an existing one.

Step 2: Mount Google Drive

Before we can read data from Google Drive, we need to mount our Drive in Colab. To do this, run the following code:

from google.colab import drive
drive.mount('/content/drive')

This code will prompt you to authorize Colab to access your Google Drive. Follow the instructions to grant access, and then enter the authorization code that is provided.

Once you’ve done this, your Google Drive will be mounted in Colab and you’ll be able to access your files.

Step 3: Read Data from Google Sheet

Now that we’ve mounted our Google Drive, we can read data from a Google Sheet. To do this, we’ll use the pandas library, which is a popular Python library for data manipulation.

First, let’s import the pandas library:

import pandas as pd

Next, we’ll use the read_csv function from pandas to read the data from our Google Sheet. The read_csv function is used to read data from a CSV file, but it can also be used to read data from a Google Sheet if we provide the URL of the sheet.

To get the URL of our Google Sheet, we’ll need to open the sheet in Google Drive and copy the URL from the address bar. The URL should look something like this:

https://drive.google.com/drive/folders/ABC123XYZ456

Replace ABC123XYZ456 with the ID of your Google Sheet.

Now, let’s use the read_csv function to read the data from our Google Sheet:

url = 'https://docs.google.com/spreadsheets/d/ABC123XYZ456/export?format=csv'
df = pd.read_csv(url)

In this code, we’ve provided the URL of our Google Sheet to the read_csv function. We’ve also specified the format parameter as csv, which tells Google Drive to export the sheet as a CSV file.

The read_csv function returns a DataFrame object, which is a two-dimensional tabular data structure with labeled axes. We’ve assigned this object to the variable df.

Step 4: Analyze the Data

Now that we’ve read our data into a DataFrame, we can analyze it using the tools provided by pandas.

For example, we can use the head function to display the first few rows of our data:

df.head()

We can also use the describe function to get a summary of our data:

df.describe()

Reading data from google drive

These are just a few examples of the many ways that you can analyze your data using pandas.

Alternative Approach: Using gspread with Colab

While the previous method demonstrated the use of pandas for data retrieval from Google Drive in Colab, an alternative approach involves leveraging the gspread library. This library provides direct integration with Google Sheets, offering a streamlined process for accessing data stored in Google Drive.

# Install the required libraries
!pip install gspread google-auth

# Import libraries
import gspread
from google.colab import auth
from oauth2client.client import GoogleCredentials
from google.auth import default

# Authenticate and create a client
auth.authenticate_user()
creds, _ = default()
gc = gspread.authorize(creds)

# Open the Google Sheets file (replace 'your_spreadsheet_name' with your file's name)
spreadsheet = gc.open("your_spreadsheet_name").sheet1

# Now you can work with the 'worksheet' object, for example, fetching the data
data = spreadsheet.get_all_values()
print(data)
  1. Install Libraries: Begin by installing the necessary libraries, gspread, and google-auth.
  2. Import Libraries: Import the required libraries, including gspread for Google Sheets integration.
  3. Authenticate and Create a Client: Use the authentication code to authenticate the user and create a gspread client for accessing Google Sheets.
  4. Open Google Sheets File: Replace ‘your_spreadsheet_name’ with the actual name of your Google Sheets file. This step opens the specified sheet and allows you to work with its data.
  5. Work with Worksheet Object: Once the sheet is open, you can perform various operations on the ‘worksheet’ object. In the provided example, the get_all_values() method is used to fetch all the data from the sheet.

Colab Read Data from Drive

This alternative method offers a more direct and specialized approach for working with Google Sheets, providing a viable option based on the specific needs of your project.

Conclusion

In this blog post, we’ve shown you how to read data from Google Sheets using Colaboratory. By following these steps, you should now be able to access and manipulate data stored in your Google Sheets.

Colaboratory is a powerful tool for data analysis and machine learning, and its integration with Google Drive makes it even more useful.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.