How to Concatenate a List of Pandas DataFrames Together

As a data scientist or software engineer, you may often find yourself working with large datasets that are spread across multiple files or sources. In such cases, it becomes necessary to concatenate these datasets together to create a unified view of the data. In this blog post, we will discuss how to concatenate a list of pandas DataFrames together to create a single DataFrame.

As a data scientist or software engineer, you may often find yourself working with large datasets that are spread across multiple files or sources. In such cases, it becomes necessary to concatenate these datasets together to create a unified view of the data. In this blog post, we will discuss how to concatenate a list of pandas DataFrames together to create a single DataFrame.

Table of Contents

  1. What is Concatenation?
  2. Concatenating DataFrames Vertically
  3. Concatenating DataFrames Horizontally
  4. Concatenating DataFrames with Different Column Names
  5. Conclusion

What is Concatenation?

Concatenation is the process of combining two or more datasets into a single dataset. In pandas, concatenation is performed using the concat() function. The concat() function can be used to concatenate DataFrames vertically (along rows) or horizontally (along columns).

Concatenating DataFrames Vertically

To concatenate DataFrames vertically, we first need to create a list of DataFrames that we want to concatenate. Let’s create a list of three sample DataFrames:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 5, 6], 'B': [7, 8, 9]})
df3 = pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]})

df_list = [df1, df2, df3]

Now that we have a list of DataFrames, we can concatenate them using the concat() function:

df_concat = pd.concat(df_list)

Output:

   A   B
0  1   4
1  2   5
2  3   6
0  4   7
1  5   8
2  6   9
0  7  10
1  8  11
2  9  12

This will concatenate the three DataFrames vertically and create a single DataFrame. The resulting DataFrame will have all the rows of the original DataFrames stacked on top of each other.

Concatenating DataFrames Horizontally

To concatenate DataFrames horizontally, we first need to create a list of DataFrames that we want to concatenate. Let’s create a list of three sample DataFrames:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'C': [4, 5, 6], 'D': [7, 8, 9]})
df3 = pd.DataFrame({'E': [7, 8, 9], 'F': [10, 11, 12]})

df_list = [df1, df2, df3]

Now that we have a list of DataFrames, we can concatenate them using the concat() function:

df_concat = pd.concat(df_list, axis=1)

Output:

   A  B  A  B  A   B
0  1  4  4  7  7  10
1  2  5  5  8  8  11
2  3  6  6  9  9  12

This will concatenate the three DataFrames horizontally and create a single DataFrame. The resulting DataFrame will have all the columns of the original DataFrames concatenated next to each other.

Concatenating DataFrames with Different Column Names

When concatenating DataFrames with different column names, we can use the join parameter of the concat() function to specify how to handle the missing columns. The join parameter can take one of four values:

  • inner: Only the columns that are present in all DataFrames will be included in the concatenated DataFrame.
  • outer: All columns from all DataFrames will be included in the concatenated DataFrame. Missing values will be filled with NaN.
  • left: Only the columns from the left DataFrame and the columns that are present in both DataFrames will be included in the concatenated DataFrame. Missing values will be filled with NaN.
  • right: Only the columns from the right DataFrame and the columns that are present in both DataFrames will be included in the concatenated DataFrame. Missing values will be filled with NaN.

Let’s create two sample DataFrames with different column names:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'C': [4, 5, 6], 'D': [7, 8, 9]})

To concatenate these DataFrames, we can use the join parameter and set it to outer:

df_concat = pd.concat([df1, df2], join='outer')

Output:

     A    B    C    D
0  1.0  4.0  NaN  NaN
1  2.0  5.0  NaN  NaN
2  3.0  6.0  NaN  NaN
0  NaN  NaN  4.0  7.0
1  NaN  NaN  5.0  8.0
2  NaN  NaN  6.0  9.0

This will concatenate the two DataFrames horizontally and create a single DataFrame. The resulting DataFrame will have all the columns from both DataFrames, with missing values filled with NaN.

Conclusion

Concatenating a list of pandas DataFrames together is a powerful technique for working with large datasets. In this blog post, we have discussed how to concatenate DataFrames vertically and horizontally, as well as how to handle DataFrames with different column names. By following these steps, you can easily concatenate a list of pandas DataFrames together and create a unified view of your data.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.