How to Sum Two Columns in a Pandas DataFrame

In this blog, we explore various methods for adding two columns in a Pandas DataFrame, offering valuable insights for data scientists and software engineers working with data analysis and manipulation in Python using Pandas.

As a data scientist or software engineer, you may often need to perform calculations on data stored in a Pandas DataFrame. One common task is to sum two columns in a DataFrame. In this article, we will discuss different ways to achieve this using Pandas.

What is a Pandas DataFrame?

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table, but with more powerful features. Pandas is an open-source library in Python, widely used for data manipulation and analysis.

How to Sum Two Columns in a Pandas DataFrame

Suppose we have a DataFrame with two columns, column1 and column2, and we want to create a new column sum that contains the sum of these two columns. Here are three methods to achieve this:

Method 1: Using the + Operator

The simplest way to add two columns in a Pandas DataFrame is to use the + operator. We can create a new column sum by adding the two columns together, like this:

import pandas as pd

df = pd.DataFrame({'column1': [1, 2, 3], 'column2': [4, 5, 6]})
# add 2 columns using + operator
df['sum'] = df['column1'] + df['column2']
print(df)

Output:

   column1  column2  sum
0        1        4    5
1        2        5    7
2        3        6    9

In this example, we create a DataFrame with two columns column1 and column2, each containing three values. We then add these two columns together using the + operator and assign the result to a new column sum.

Method 2: Using the sum() Function

Another way to add two columns in a Pandas DataFrame is to use the sum() function. We can create a new column sum by applying the sum() function to the two columns, like this:

import pandas as pd

df = pd.DataFrame({'column1': [1, 2, 3], 'column2': [4, 5, 6]})
# add 2 columns using sum()
df['sum'] = df[['column1', 'column2']].sum(axis=1)
print(df)

Output:

   column1  column2  sum
0        1        4    5
1        2        5    7
2        3        6    9

In this example, we create a DataFrame with two columns column1 and column2, each containing three values. We then select these two columns using df[['column1', 'column2']], apply the sum() function along the rows (axis=1), and assign the result to a new column sum.

Method 3: Using the apply() Function

A third way to add two columns in a Pandas DataFrame is to use the apply() function. We can create a new column sum by applying a lambda function that adds the two columns together, like this:

import pandas as pd

df = pd.DataFrame({'column1': [1, 2, 3], 'column2': [4, 5, 6]})
# add 2 columns using apply()
df['sum'] = df.apply(lambda row: row['column1'] + row['column2'], axis=1)
print(df)

Output:

   column1  column2  sum
0        1        4    5
1        2        5    7
2        3        6    9

In this example, we create a DataFrame with two columns column1 and column2, each containing three values. We then apply a lambda function to each row of the DataFrame using the apply() function. The lambda function takes a row as input and returns the sum of the two columns. We assign the result to a new column sum.

Conclusion

In this article, we discussed different ways to add two columns in a Pandas DataFrame. We showed how to use the + operator, the sum() function, and the apply() function. While all three methods achieve the same result, they differ in terms of readability, performance, and flexibility. The + operator is the simplest and most intuitive method, but it may not be the most efficient for large datasets. The sum() function is more flexible and can handle missing values, but it requires more typing. The apply() function is the most flexible and can handle complex operations, but it may be slower than the other methods. As a data scientist or software engineer, you should choose the method that best fits your needs and context.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.