How to Import Multiple Excel Files into Python Pandas and Concatenate Them into One Dataframe
As a data scientist or software engineer, you might often face a situation where you need to import multiple Excel files into Python pandas and concatenate them into one dataframe. This process can be time-consuming and tedious if done manually. However, with the help of pandas, it can be done easily and efficiently. In this blog post, we will discuss how to import multiple Excel files into Python pandas and concatenate them into one dataframe.
Table of Contents
- Why Importing Multiple Excel Files is Important
- Steps to Import Multiple Excel Files into Python Pandas and Concatenate Them into One Dataframe
- Common Errors and Solutions
- Best Practices
- Conclusion
Why Importing Multiple Excel Files is Important
Excel is a widely used tool for data storage and analysis. However, one of the drawbacks of Excel is that it limits the amount of data that can be stored in a single file. As a result, large datasets are often divided into multiple Excel files. Importing these files into Python pandas and concatenating them into one dataframe allows you to perform data analysis on the entire dataset, without the need to manually merge the files.
Steps to Import Multiple Excel Files into Python Pandas and Concatenate Them into One Dataframe
Step 1: Install Required Libraries
Before we can import Excel files into Python pandas, we need to install the required libraries. We will be using pandas and glob libraries for this purpose. You can install these libraries using pip, a python package manager, by running the following command in your terminal:
pip install pandas glob
Step 2: Import Required Libraries
After installing the required libraries, we can import them into our Python script using the following code:
import pandas as pd
import glob
Step 3: Define Path to Excel Files
The next step is to define the path to the Excel files. We will be using the glob library to obtain the path of all Excel files in a directory. You can define the path to your Excel files using the following code:
path = r'Saturn'
Let’s assume that we have these 3 following .xlxs
files which are located in the folder named Saturn
:
# xlxs1
Name Age Salary
0 John 25 50000
1 Alice 30 60000
2 Bob 22 45000
# xlxs2
Name Age Salary
0 Eva 28 55000
1 David 35 70000
2 Sophia 27 48000
# xlxs3
Name Age Salary
0 Chris 32 62000
1 Olivia 26 52000
2 Daniel 29 58000
Step 4: Import Excel Files into Pandas
Once we have defined the path to the Excel files, we can use the pandas library to import the files into Python pandas. We will be using a for loop to iterate through all Excel files in the directory and import them into pandas. You can use the following code to import Excel files into pandas:
all_files = glob.glob(path + "/*.xlsx")
li = []
for filename in all_files:
df = pd.read_excel(filename, index_col=None, header=0)
li.append(df)
In the above code, we have used the read_excel()
function of pandas to read all Excel files in the directory and append them to a list called li
. The index_col parameter is set to None to prevent pandas from creating a new index column, and the header parameter is set to 0 to use the first row as the header.
Step 5: Concatenate Excel Files into One Dataframe
Finally, we can concatenate all Excel files into one dataframe using the pandas library. We will be using the concat() function to concatenate all dataframes in the list ‘li’ into one dataframe. You can use the following code to concatenate Excel files into one dataframe:
df = pd.concat(li, axis=0, ignore_index=True)
print(df)
In the above code, we have used the concat() function to concatenate all dataframes in the list li
into one dataframe. The axis parameter is set to 0 to concatenate vertically, and the ignore_index parameter is set to True to reset the index of the concatenated dataframe.
Output:
Name Age Salary
0 John 25 50000
1 Alice 30 60000
2 Bob 22 45000
3 Eva 28 55000
4 David 35 70000
5 Sophia 27 48000
6 Chris 32 62000
7 Olivia 26 52000
8 Daniel 29 58000
Sure, let’s start by adding a section on creating example Excel files and saving them in one folder.
Creating Example Excel Files
To create example Excel files for demonstration purposes, you can use the pandas
library to generate synthetic data and save it to Excel files. Below is an example code snippet that creates three sample Excel files with random data:
import pandas as pd
import numpy as np
# Create sample dataframes
df1 = pd.DataFrame({'A': np.random.rand(5), 'B': np.random.randint(1, 10, 5)})
df2 = pd.DataFrame({'A': np.random.rand(5), 'B': np.random.randint(1, 10, 5)})
df3 = pd.DataFrame({'A': np.random.rand(5), 'B': np.random.randint(1, 10, 5)})
# Specify the folder path to save the Excel files
example_folder_path = 'path/to/example/files/'
# Save dataframes to Excel files
df1.to_excel(example_folder_path + 'example_file1.xlsx', index=False)
df2.to_excel(example_folder_path + 'example_file2.xlsx', index=False)
df3.to_excel(example_folder_path + 'example_file3.xlsx', index=False)
This code creates three dataframes (df1
, df2
, and df3
) with random data and saves them as Excel files in the specified folder (example_folder_path
). You can adjust the number of rows, columns, and data types based on your specific needs.
Common Errors and Solutions
Error 1: Incorrect Path
Error: FileNotFoundError: [Errno 2] No such file or directory: 'path/to/excel/files'
Solution: Double-check the path you have defined. Ensure that it is the correct path to the folder where your Excel files are located. You can use an absolute path or a relative path based on the current working directory.
Error 2: Unsupported Excel Format
Error: ValueError: No engine for filetype: 'xlsx'
Solution: Ensure that you have the required library to handle Excel files installed. In this case, pandas
uses the openpyxl
engine for reading and writing Excel files. Install it using pip install openpyxl
.
Best Practices
1. Use Absolute Paths
Always use absolute paths for specifying the folder path to avoid any confusion regarding the working directory. You can use the os
module to make paths platform-independent:
import os
path = os.path.abspath('path/to/excel/files')
2. Error Handling
Implement proper error handling to anticipate issues. For example, you can use try
and except
blocks to handle potential errors when reading Excel files.
try:
df = pd.read_excel(filename, index_col=None, header=0)
li.append(df)
except Exception as e:
print(f"Error reading {filename}: {e}")
Conclusion
In this blog post, we have discussed how to import multiple Excel files into Python pandas and concatenate them into one dataframe. We have covered the steps required to install the required libraries, import the libraries into Python, define the path to Excel files, import Excel files into pandas, and concatenate Excel files into one dataframe. By following these steps, you can easily import and merge multiple Excel files into one dataframe, allowing you to perform data analysis on the entire dataset.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.