How to Load an XLSX File from Google Drive in Google Colaboratory
As a software engineer, you may need to work with data in various formats, including Microsoft Excel files (XLSX). Google Colaboratory, a cloud-based platform for data science and machine learning, provides a convenient way to load XLSX files from Google Drive into your notebook for analysis. In this tutorial, we’ll show you how to load an XLSX file from Google Drive into Google Colaboratory using Python.
Before we begin, make sure you have the following:
- A Google account
- A Google Drive account
- A Google Colaboratory notebook
Step 1: Authenticate with Google Drive
To access files in your Google Drive from Google Colaboratory, you need to authenticate your notebook with Google Drive. Run the following code to authenticate:
from google.colab import drive
This code will prompt you to sign in to your Google account and authorize Google Colaboratory to access your Google Drive. Follow the prompts to complete the authentication process.
Step 2: Locate the XLSX File in Google Drive
Once you have authenticated with Google Drive, you need to locate the XLSX file you want to load. You can navigate to the file manually using the Google Drive web interface, or you can use Python to search for the file.
Assuming your XLSX file is located in the root directory of your Google Drive, you can use the following code to find the file:
# Set the search parameters
# Make sure to replace 'music1.xlsx' with actual name of your file in your directory
filename = 'music1.xlsx'
search_path = '/content/drive/My Drive'
# Search for the file
file_path = None
for root, dirs, files in os.walk(search_path):
if filename in files:
file_path = os.path.join(root, filename)
df = pd.read_excel(file_path)
This code searches for the file
music1.xlsx in the root directory (
/content/drive/My Drive) of your Google Drive. If the file is found,
file_path will contain the full path to the file.
You can also navigate the to panel on the left side of your Google colab notebook, as shown in the figure below:
Step 3: Load the XLSX File into a Pandas DataFrame
Now that you have located the XLSX file, you can load it into a Pandas DataFrame for analysis. Pandas is a popular Python library for data manipulation and analysis.
Assuming you have already installed Pandas in your notebook, you can use the following code to load the XLSX file into a DataFrame:
import pandas as pd
# Load the XLSX file into a DataFrame
df = pd.read_excel(file_path)
This code uses the
read_excel() function from Pandas to load the XLSX file into a DataFrame named
Step 4: Analyze the Data
Now that you have loaded the XLSX file into a Pandas DataFrame, you can analyze the data using various Pandas functions. For example, you can use the
head() function to view the first few rows of the DataFrame:
# Display the first few rows of the DataFrame
This code will display the first few rows of the DataFrame in your notebook:
Age Gender Genre
20 1 HipHop Creole
23 1 HipHop Creole
25 1 HipHop Creole
26 1 Compas
29 1 Compas
In this tutorial, we showed you how to load an XLSX file from Google Drive into Google Colaboratory using Python. We covered the following steps:
- Authenticate with Google Drive
- Locate the XLSX file in Google Drive
- Load the XLSX file into a Pandas DataFrame
- Analyze the data
By following these steps, you can easily work with XLSX files in Google Colaboratory for your data analysis needs.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.