How to Load an XLSX File from Google Drive in Google Colaboratory

As a software engineer you may need to work with data in various formats including Microsoft Excel files (XLSX). Google Colaboratory a cloudbased platform for data science and machine learning provides a convenient way to load XLSX files from Google Drive into your notebook for analysis In this tutorial, we’ll show you how to load an XLSX file from Google Drive into Google Colaboratory using Python.

As a software engineer, you may need to work with data in various formats, including Microsoft Excel files (XLSX). Google Colaboratory, a cloud-based platform for data science and machine learning, provides a convenient way to load XLSX files from Google Drive into your notebook for analysis. In this tutorial, we’ll show you how to load an XLSX file from Google Drive into Google Colaboratory using Python.

Prerequisites

Before we begin, make sure you have the following:

  • A Google account
  • A Google Drive account
  • A Google Colaboratory notebook

Step 1: Authenticate with Google Drive

To access files in your Google Drive from Google Colaboratory, you need to authenticate your notebook with Google Drive. Run the following code to authenticate:

from google.colab import drive
drive.mount('/content/drive')

This code will prompt you to sign in to your Google account and authorize Google Colaboratory to access your Google Drive. Follow the prompts to complete the authentication process.

Step 2: Locate the XLSX File in Google Drive

Once you have authenticated with Google Drive, you need to locate the XLSX file you want to load. You can navigate to the file manually using the Google Drive web interface, or you can use Python to search for the file.

Assuming your XLSX file is located in the root directory of your Google Drive, you can use the following code to find the file:

import os

# Set the search parameters
# Make sure to replace 'music1.xlsx' with actual name of your file in your directory
filename = 'music1.xlsx'
search_path = '/content/drive/My Drive'
# Search for the file
file_path = None
for root, dirs, files in os.walk(search_path):
    if filename in files:
        file_path = os.path.join(root, filename)
        break
    df = pd.read_excel(file_path)

This code searches for the file music1.xlsx in the root directory (/content/drive/My Drive) of your Google Drive. If the file is found, file_path will contain the full path to the file.

You can also navigate the to panel on the left side of your Google colab notebook, as shown in the figure below:

Alt text

Step 3: Load the XLSX File into a Pandas DataFrame

Now that you have located the XLSX file, you can load it into a Pandas DataFrame for analysis. Pandas is a popular Python library for data manipulation and analysis.

Assuming you have already installed Pandas in your notebook, you can use the following code to load the XLSX file into a DataFrame:

import pandas as pd

# Load the XLSX file into a DataFrame
df = pd.read_excel(file_path)

This code uses the read_excel() function from Pandas to load the XLSX file into a DataFrame named df.

Step 4: Analyze the Data

Now that you have loaded the XLSX file into a Pandas DataFrame, you can analyze the data using various Pandas functions. For example, you can use the head() function to view the first few rows of the DataFrame:

# Display the first few rows of the DataFrame
df.head()

This code will display the first few rows of the DataFrame in your notebook:

   Age 	Gender 	Genre
    20 	    1 	HipHop Creole
    23 	    1 	HipHop Creole
    25 	    1 	HipHop Creole
    26 	    1 	Compas
    29 	    1 	Compas

Conclusion

In this tutorial, we showed you how to load an XLSX file from Google Drive into Google Colaboratory using Python. We covered the following steps:

  1. Authenticate with Google Drive
  2. Locate the XLSX file in Google Drive
  3. Load the XLSX file into a Pandas DataFrame
  4. Analyze the data

By following these steps, you can easily work with XLSX files in Google Colaboratory for your data analysis needs.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.