How to Concatenate Rows of Two DataFrames in Pandas
As a data scientist or software engineer, it is common to work with data stored in multiple files or tables. In such cases, it is often necessary to combine the data from these sources into a single dataset for further analysis.
One common way to combine data from multiple sources is by concatenating rows of two dataframes. In this blog post, we will explore how to do this using the Python library, Pandas.
What is Pandas?
Pandas is a popular Python library for data manipulation and analysis. It provides data structures for efficiently storing and manipulating large datasets, and a variety of functions for data cleaning, transformation, and analysis.
One of the key data structures in Pandas is the DataFrame, which is a two-dimensional table of data with rows and columns. Each column in a DataFrame can have a different data type, such as numerical, categorical, or text data.
How to Concatenate Rows of Two DataFrames
Concatenating rows of two dataframes in Pandas is a straightforward process. The concat()
function in Pandas can be used to concatenate two or more DataFrames along a particular axis.
To concatenate rows of two dataframes, we need to concatenate them along the row axis. Here’s an example:
import pandas as pd
# Create two sample dataframes
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]})
# Concatenate the two dataframes along the row axis
result = pd.concat([df1, df2], axis=0)
print(result)
Output:
A B
0 1 4
1 2 5
2 3 6
0 7 10
1 8 11
2 9 12
In this example, we create two sample dataframes df1
and df2
, each with two columns and three rows of data. We then use the concat()
function to concatenate the two dataframes along the row axis (axis=0
). The resulting dataframe, result
, contains all the rows from both df1
and df2
.
Handling Duplicate Index Values
When concatenating rows of two dataframes, it is possible that the resulting dataframe may contain duplicate index values. This can happen if the two dataframes being concatenated have overlapping index values.
To handle this situation, Pandas provides several options for dealing with duplicate index values. By default, Pandas will preserve all index values, even if they are duplicated. However, we can also choose to ignore duplicate index values, or to reindex the resulting dataframe with a new set of unique index values.
Here’s an example of how to handle duplicate index values when concatenating rows of two dataframes:
import pandas as pd
# Create two sample dataframes with overlapping index values
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=[0, 1, 2])
df2 = pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]}, index=[2, 3, 4])
# Concatenate the two dataframes along the row axis, ignoring duplicate index values
result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)
Output:
A B
0 1 4
1 2 5
2 3 6
3 7 10
4 8 11
5 9 12
In this example, we create two sample dataframes df1
and df2
, each with two columns and three rows of data. We also set the index values of df1
to [0, 1, 2]
and the index values of df2
to [2, 3, 4]
, which creates an overlap at index value 2
.
We then use the concat()
function to concatenate the two dataframes along the row axis (axis=0
), and set the ignore_index
parameter to True
. This tells Pandas to ignore any duplicate index values and to reindex the resulting dataframe with a new set of unique index values.
The resulting dataframe, result
, contains all the rows from both df1
and df2
, with unique index values starting from 0
.
Conclusion
Concatenating rows of two dataframes in Pandas is a simple and powerful way to combine data from multiple sources. By using the concat()
function, we can easily concatenate two or more dataframes along a particular axis, and handle any duplicate index values that may arise.
In this blog post, we’ve explored how to concatenate rows of two dataframes in Pandas, and how to handle duplicate index values. We hope you find this information useful in your data analysis and manipulation tasks. Happy coding!
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.