How to Concatenate a List of Pandas DataFrames Together
As a data scientist or software engineer, you may often find yourself working with large datasets that are spread across multiple files or sources. In such cases, it becomes necessary to concatenate these datasets together to create a unified view of the data. In this blog post, we will discuss how to concatenate a list of pandas DataFrames together to create a single DataFrame.
Table of Contents
- What is Concatenation?
- Concatenating DataFrames Vertically
- Concatenating DataFrames Horizontally
- Concatenating DataFrames with Different Column Names
- Conclusion
What is Concatenation?
Concatenation is the process of combining two or more datasets into a single dataset. In pandas, concatenation is performed using the concat()
function. The concat()
function can be used to concatenate DataFrames vertically (along rows) or horizontally (along columns).
Concatenating DataFrames Vertically
To concatenate DataFrames vertically, we first need to create a list of DataFrames that we want to concatenate. Let’s create a list of three sample DataFrames:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 5, 6], 'B': [7, 8, 9]})
df3 = pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]})
df_list = [df1, df2, df3]
Now that we have a list of DataFrames, we can concatenate them using the concat()
function:
df_concat = pd.concat(df_list)
Output:
A B
0 1 4
1 2 5
2 3 6
0 4 7
1 5 8
2 6 9
0 7 10
1 8 11
2 9 12
This will concatenate the three DataFrames vertically and create a single DataFrame. The resulting DataFrame will have all the rows of the original DataFrames stacked on top of each other.
Concatenating DataFrames Horizontally
To concatenate DataFrames horizontally, we first need to create a list of DataFrames that we want to concatenate. Let’s create a list of three sample DataFrames:
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'C': [4, 5, 6], 'D': [7, 8, 9]})
df3 = pd.DataFrame({'E': [7, 8, 9], 'F': [10, 11, 12]})
df_list = [df1, df2, df3]
Now that we have a list of DataFrames, we can concatenate them using the concat()
function:
df_concat = pd.concat(df_list, axis=1)
Output:
A B A B A B
0 1 4 4 7 7 10
1 2 5 5 8 8 11
2 3 6 6 9 9 12
This will concatenate the three DataFrames horizontally and create a single DataFrame. The resulting DataFrame will have all the columns of the original DataFrames concatenated next to each other.
Concatenating DataFrames with Different Column Names
When concatenating DataFrames with different column names, we can use the join
parameter of the concat()
function to specify how to handle the missing columns. The join
parameter can take one of four values:
inner
: Only the columns that are present in all DataFrames will be included in the concatenated DataFrame.outer
: All columns from all DataFrames will be included in the concatenated DataFrame. Missing values will be filled withNaN
.left
: Only the columns from the left DataFrame and the columns that are present in both DataFrames will be included in the concatenated DataFrame. Missing values will be filled withNaN
.right
: Only the columns from the right DataFrame and the columns that are present in both DataFrames will be included in the concatenated DataFrame. Missing values will be filled withNaN
.
Let’s create two sample DataFrames with different column names:
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'C': [4, 5, 6], 'D': [7, 8, 9]})
To concatenate these DataFrames, we can use the join
parameter and set it to outer
:
df_concat = pd.concat([df1, df2], join='outer')
Output:
A B C D
0 1.0 4.0 NaN NaN
1 2.0 5.0 NaN NaN
2 3.0 6.0 NaN NaN
0 NaN NaN 4.0 7.0
1 NaN NaN 5.0 8.0
2 NaN NaN 6.0 9.0
This will concatenate the two DataFrames horizontally and create a single DataFrame. The resulting DataFrame will have all the columns from both DataFrames, with missing values filled with NaN
.
Conclusion
Concatenating a list of pandas DataFrames together is a powerful technique for working with large datasets. In this blog post, we have discussed how to concatenate DataFrames vertically and horizontally, as well as how to handle DataFrames with different column names. By following these steps, you can easily concatenate a list of pandas DataFrames together and create a unified view of your data.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.