How to Read Multiple CSV Files into Python Pandas Dataframe
As a data scientist or software engineer, working with large datasets is a common scenario. In such cases, it’s important to be able to efficiently read data from various sources and combine them into a single dataset. One of the most common formats for storing data is CSV (Comma Separated Values). In this article, we’ll explore how to read multiple CSV files into a single Python Pandas dataframe.
Table of Contents
- What Is a CSV File?
- Why Read Multiple CSV Files into a Single Dataframe?
- How to Read Multiple CSV Files into a Single Dataframe
- Common Errors and How to Handle
- Conclusion
What Is a CSV File?
A CSV file is a text file that stores tabular data in a plain-text format. Each line in the file represents a row in the table, while commas separate the columns. The first row in the file typically contains headers that describe the columns.
Here’s an example of a CSV file:
Name, Age, Gender
John, 25, M
Jane, 30, F
Bob, 40, M
Why Read Multiple CSV Files into a Single Dataframe?
In many cases, data is stored in multiple CSV files that need to be combined into a single dataset. For example, you might have data for different years or regions that need to be combined for analysis. Combining data into a single dataframe allows you to perform statistical analysis, data visualization, and machine learning tasks more easily.
How to Read Multiple CSV Files into a Single Dataframe
Let’s assume that we have these following csv files
Python’s Pandas library provides a convenient way to read CSV files into a dataframe. To read multiple CSV files into a single dataframe, we can use the concat
function from Pandas.
Assuming that all CSV files have the same structure, we can use the following code:
import pandas as pd
import glob
# Get a list of all CSV files in a directory
csv_files = glob.glob('saturn/*.csv')
# Create an empty dataframe to store the combined data
combined_df = pd.DataFrame()
# Loop through each CSV file and append its contents to the combined dataframe
for csv_file in csv_files:
df = pd.read_csv(csv_file)
combined_df = pd.concat([combined_df, df])
# Print the combined dataframe
print(combined_df)
Output:
ID Value
0 1 0.462535
1 2 0.747471
2 3 0.036683
3 4 0.252437
4 5 0.713350
0 1 0.895207
1 2 0.511677
2 3 0.532113
3 4 0.107172
4 5 0.447412
...
0 1 0.245958
1 2 0.160681
2 3 0.186567
3 4 0.285095
4 5 0.173374
Here’s what the code does:
- We import the Pandas library and the
glob
module, which allows us to easily get a list of all CSV files in a directory. - We use the
glob
function to get a list of all CSV files in the specified directory. - We create an empty dataframe called
combined_df
to store the combined data. - We loop through each CSV file in the list and read its contents into a dataframe using the
read_csv
function from Pandas. - We use the
concat
function from Pandas to append the contents of each CSV file to thecombined_df
dataframe. - Finally, we print the combined dataframe to verify that the data has been combined correctly.
Common Errors and How to Handle
- Error 1: Inconsistent Column Headers
import glob
files = glob.glob('sample_files/*.csv')
dfs = [pd.read_csv(file) for file in files]
- Error 2: Memory Issues
# Reading large files in chunks
chunk_size = 1000
chunks = pd.read_csv('large_file.csv', chunksize=chunk_size)
for chunk in chunks:
process_data(chunk)
Conclusion
In this article, we’ve learned how to read multiple CSV files into a single Python Pandas dataframe. This is a useful technique for combining data from different sources and preparing it for analysis. With the glob
and concat
functions from Pandas, it’s easy to read and combine data from multiple CSV files.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.