📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. 📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. 📣 Introducing $2.95/Hr H100, H200, B200s, and B300s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure   📣 Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem.
← Back to Blog

How to Concatenate a List of Pandas DataFrames Together

As a data scientist or software engineer, you may often find yourself working with large datasets that are spread across multiple files or sources. In such cases, it becomes necessary to concatenate these datasets together to create a unified view of the data. In this blog post, we will discuss how to concatenate a list of pandas DataFrames together to create a single DataFrame.

How to Concatenate a List of Pandas DataFrames Together

As a data scientist or software engineer, you may often find yourself working with large datasets that are spread across multiple files or sources. In such cases, it becomes necessary to concatenate these datasets together to create a unified view of the data. In this blog post, we will discuss how to concatenate a list of pandas DataFrames together to create a single DataFrame.

Table of Contents

  1. What is Concatenation?
  2. Concatenating DataFrames Vertically
  3. Concatenating DataFrames Horizontally
  4. Concatenating DataFrames with Different Column Names
  5. Conclusion

What is Concatenation?

Concatenation is the process of combining two or more datasets into a single dataset. In pandas, concatenation is performed using the concat() function. The concat() function can be used to concatenate DataFrames vertically (along rows) or horizontally (along columns).

Concatenating DataFrames Vertically

To concatenate DataFrames vertically, we first need to create a list of DataFrames that we want to concatenate. Let’s create a list of three sample DataFrames:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 5, 6], 'B': [7, 8, 9]})
df3 = pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]})

df_list = [df1, df2, df3]

Now that we have a list of DataFrames, we can concatenate them using the concat() function:

df_concat = pd.concat(df_list)

Output:

   A   B
0  1   4
1  2   5
2  3   6
0  4   7
1  5   8
2  6   9
0  7  10
1  8  11
2  9  12

This will concatenate the three DataFrames vertically and create a single DataFrame. The resulting DataFrame will have all the rows of the original DataFrames stacked on top of each other.

Concatenating DataFrames Horizontally

To concatenate DataFrames horizontally, we first need to create a list of DataFrames that we want to concatenate. Let’s create a list of three sample DataFrames:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'C': [4, 5, 6], 'D': [7, 8, 9]})
df3 = pd.DataFrame({'E': [7, 8, 9], 'F': [10, 11, 12]})

df_list = [df1, df2, df3]

Now that we have a list of DataFrames, we can concatenate them using the concat() function:

df_concat = pd.concat(df_list, axis=1)

Output:

   A  B  A  B  A   B
0  1  4  4  7  7  10
1  2  5  5  8  8  11
2  3  6  6  9  9  12

This will concatenate the three DataFrames horizontally and create a single DataFrame. The resulting DataFrame will have all the columns of the original DataFrames concatenated next to each other.

Concatenating DataFrames with Different Column Names

When concatenating DataFrames with different column names, we can use the join parameter of the concat() function to specify how to handle the missing columns. The join parameter can take one of four values:

  • inner: Only the columns that are present in all DataFrames will be included in the concatenated DataFrame.
  • outer: All columns from all DataFrames will be included in the concatenated DataFrame. Missing values will be filled with NaN.
  • left: Only the columns from the left DataFrame and the columns that are present in both DataFrames will be included in the concatenated DataFrame. Missing values will be filled with NaN.
  • right: Only the columns from the right DataFrame and the columns that are present in both DataFrames will be included in the concatenated DataFrame. Missing values will be filled with NaN.

Let’s create two sample DataFrames with different column names:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'C': [4, 5, 6], 'D': [7, 8, 9]})

To concatenate these DataFrames, we can use the join parameter and set it to outer:

df_concat = pd.concat([df1, df2], join='outer')

Output:

     A    B    C    D
0  1.0  4.0  NaN  NaN
1  2.0  5.0  NaN  NaN
2  3.0  6.0  NaN  NaN
0  NaN  NaN  4.0  7.0
1  NaN  NaN  5.0  8.0
2  NaN  NaN  6.0  9.0

This will concatenate the two DataFrames horizontally and create a single DataFrame. The resulting DataFrame will have all the columns from both DataFrames, with missing values filled with NaN.

Conclusion

Concatenating a list of pandas DataFrames together is a powerful technique for working with large datasets. In this blog post, we have discussed how to concatenate DataFrames vertically and horizontally, as well as how to handle DataFrames with different column names. By following these steps, you can easily concatenate a list of pandas DataFrames together and create a unified view of your data.

Keep reading

Related articles

How to Concatenate a List of Pandas DataFrames Together
Dec 29, 2023

How to Resolve Memory Errors in Amazon SageMaker

How to Concatenate a List of Pandas DataFrames Together
Dec 22, 2023

Loading S3 Data into Your AWS SageMaker Notebook: A Guide

How to Concatenate a List of Pandas DataFrames Together
Dec 19, 2023

How to Convert Pandas Series to DateTime in a DataFrame