How to Import Multiple Excel Files into Python Pandas and Concatenate Them into One Dataframe

In this blog, we will explore the process of importing numerous Excel files into Python using pandas and merging them into a single dataframe. As a data scientist or software engineer, encountering scenarios requiring the consolidation of multiple Excel files is not uncommon. Performing this task manually can be both time-consuming and laborious, but with the assistance of pandas, the process can be streamlined for efficiency.

As a data scientist or software engineer, you might often face a situation where you need to import multiple Excel files into Python pandas and concatenate them into one dataframe. This process can be time-consuming and tedious if done manually. However, with the help of pandas, it can be done easily and efficiently. In this blog post, we will discuss how to import multiple Excel files into Python pandas and concatenate them into one dataframe.

Table of Contents

  1. Why Importing Multiple Excel Files is Important
  2. Steps to Import Multiple Excel Files into Python Pandas and Concatenate Them into One Dataframe
  3. Common Errors and Solutions
  4. Best Practices
  5. Conclusion

Why Importing Multiple Excel Files is Important

Excel is a widely used tool for data storage and analysis. However, one of the drawbacks of Excel is that it limits the amount of data that can be stored in a single file. As a result, large datasets are often divided into multiple Excel files. Importing these files into Python pandas and concatenating them into one dataframe allows you to perform data analysis on the entire dataset, without the need to manually merge the files.

Steps to Import Multiple Excel Files into Python Pandas and Concatenate Them into One Dataframe

Step 1: Install Required Libraries

Before we can import Excel files into Python pandas, we need to install the required libraries. We will be using pandas and glob libraries for this purpose. You can install these libraries using pip, a python package manager, by running the following command in your terminal:

pip install pandas glob

Step 2: Import Required Libraries

After installing the required libraries, we can import them into our Python script using the following code:

import pandas as pd
import glob

Step 3: Define Path to Excel Files

The next step is to define the path to the Excel files. We will be using the glob library to obtain the path of all Excel files in a directory. You can define the path to your Excel files using the following code:

path = r'Saturn'

Let’s assume that we have these 3 following .xlxs files which are located in the folder named Saturn:

# xlxs1
    Name  Age  Salary
0   John   25   50000
1  Alice   30   60000
2    Bob   22   45000
# xlxs2
     Name  Age  Salary
0     Eva   28   55000
1   David   35   70000
2  Sophia   27   48000
# xlxs3
     Name  Age  Salary
0   Chris   32   62000
1  Olivia   26   52000
2  Daniel   29   58000

Step 4: Import Excel Files into Pandas

Once we have defined the path to the Excel files, we can use the pandas library to import the files into Python pandas. We will be using a for loop to iterate through all Excel files in the directory and import them into pandas. You can use the following code to import Excel files into pandas:

all_files = glob.glob(path + "/*.xlsx")
li = []
for filename in all_files:
    df = pd.read_excel(filename, index_col=None, header=0)
    li.append(df)

In the above code, we have used the read_excel() function of pandas to read all Excel files in the directory and append them to a list called li. The index_col parameter is set to None to prevent pandas from creating a new index column, and the header parameter is set to 0 to use the first row as the header.

Step 5: Concatenate Excel Files into One Dataframe

Finally, we can concatenate all Excel files into one dataframe using the pandas library. We will be using the concat() function to concatenate all dataframes in the list ‘li’ into one dataframe. You can use the following code to concatenate Excel files into one dataframe:

df = pd.concat(li, axis=0, ignore_index=True)
print(df)

In the above code, we have used the concat() function to concatenate all dataframes in the list li into one dataframe. The axis parameter is set to 0 to concatenate vertically, and the ignore_index parameter is set to True to reset the index of the concatenated dataframe.

Output:

     Name  Age  Salary
0    John   25   50000
1   Alice   30   60000
2     Bob   22   45000
3     Eva   28   55000
4   David   35   70000
5  Sophia   27   48000
6   Chris   32   62000
7  Olivia   26   52000
8  Daniel   29   58000

Sure, let’s start by adding a section on creating example Excel files and saving them in one folder.

Creating Example Excel Files

To create example Excel files for demonstration purposes, you can use the pandas library to generate synthetic data and save it to Excel files. Below is an example code snippet that creates three sample Excel files with random data:

import pandas as pd
import numpy as np

# Create sample dataframes
df1 = pd.DataFrame({'A': np.random.rand(5), 'B': np.random.randint(1, 10, 5)})
df2 = pd.DataFrame({'A': np.random.rand(5), 'B': np.random.randint(1, 10, 5)})
df3 = pd.DataFrame({'A': np.random.rand(5), 'B': np.random.randint(1, 10, 5)})

# Specify the folder path to save the Excel files
example_folder_path = 'path/to/example/files/'

# Save dataframes to Excel files
df1.to_excel(example_folder_path + 'example_file1.xlsx', index=False)
df2.to_excel(example_folder_path + 'example_file2.xlsx', index=False)
df3.to_excel(example_folder_path + 'example_file3.xlsx', index=False)

This code creates three dataframes (df1, df2, and df3) with random data and saves them as Excel files in the specified folder (example_folder_path). You can adjust the number of rows, columns, and data types based on your specific needs.

Common Errors and Solutions

Error 1: Incorrect Path

Error: FileNotFoundError: [Errno 2] No such file or directory: 'path/to/excel/files'

Solution: Double-check the path you have defined. Ensure that it is the correct path to the folder where your Excel files are located. You can use an absolute path or a relative path based on the current working directory.

Error 2: Unsupported Excel Format

Error: ValueError: No engine for filetype: 'xlsx'

Solution: Ensure that you have the required library to handle Excel files installed. In this case, pandas uses the openpyxl engine for reading and writing Excel files. Install it using pip install openpyxl.

Best Practices

1. Use Absolute Paths

Always use absolute paths for specifying the folder path to avoid any confusion regarding the working directory. You can use the os module to make paths platform-independent:

import os

path = os.path.abspath('path/to/excel/files')

2. Error Handling

Implement proper error handling to anticipate issues. For example, you can use try and except blocks to handle potential errors when reading Excel files.

try:
    df = pd.read_excel(filename, index_col=None, header=0)
    li.append(df)
except Exception as e:
    print(f"Error reading {filename}: {e}")

Conclusion

In this blog post, we have discussed how to import multiple Excel files into Python pandas and concatenate them into one dataframe. We have covered the steps required to install the required libraries, import the libraries into Python, define the path to Excel files, import Excel files into pandas, and concatenate Excel files into one dataframe. By following these steps, you can easily import and merge multiple Excel files into one dataframe, allowing you to perform data analysis on the entire dataset.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.