# How to Sum Two Columns in a Pandas DataFrame

In this blog, we explore various methods for adding two columns in a Pandas DataFrame, offering valuable insights for data scientists and software engineers working with data analysis and manipulation in Python using Pandas.

As a data scientist or software engineer, you may often need to perform calculations on data stored in a Pandas DataFrame. One common task is to sum two columns in a DataFrame. In this article, we will discuss different ways to achieve this using Pandas.

## What is a Pandas DataFrame?

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table, but with more powerful features. Pandas is an open-source library in Python, widely used for data manipulation and analysis.

## How to Sum Two Columns in a Pandas DataFrame

Suppose we have a DataFrame with two columns, `column1` and `column2`, and we want to create a new column `sum` that contains the sum of these two columns. Here are three methods to achieve this:

### Method 1: Using the + Operator

The simplest way to add two columns in a Pandas DataFrame is to use the `+` operator. We can create a new column `sum` by adding the two columns together, like this:

``````import pandas as pd

df = pd.DataFrame({'column1': [1, 2, 3], 'column2': [4, 5, 6]})
# add 2 columns using + operator
df['sum'] = df['column1'] + df['column2']
print(df)
``````

Output:

``````   column1  column2  sum
0        1        4    5
1        2        5    7
2        3        6    9
``````

In this example, we create a DataFrame with two columns `column1` and `column2`, each containing three values. We then add these two columns together using the `+` operator and assign the result to a new column `sum`.

### Method 2: Using the sum() Function

Another way to add two columns in a Pandas DataFrame is to use the `sum()` function. We can create a new column `sum` by applying the `sum()` function to the two columns, like this:

``````import pandas as pd

df = pd.DataFrame({'column1': [1, 2, 3], 'column2': [4, 5, 6]})
# add 2 columns using sum()
df['sum'] = df[['column1', 'column2']].sum(axis=1)
print(df)
``````

Output:

``````   column1  column2  sum
0        1        4    5
1        2        5    7
2        3        6    9
``````

In this example, we create a DataFrame with two columns `column1` and `column2`, each containing three values. We then select these two columns using `df[['column1', 'column2']]`, apply the `sum()` function along the rows (`axis=1`), and assign the result to a new column `sum`.

### Method 3: Using the apply() Function

A third way to add two columns in a Pandas DataFrame is to use the `apply()` function. We can create a new column `sum` by applying a lambda function that adds the two columns together, like this:

``````import pandas as pd

df = pd.DataFrame({'column1': [1, 2, 3], 'column2': [4, 5, 6]})
# add 2 columns using apply()
df['sum'] = df.apply(lambda row: row['column1'] + row['column2'], axis=1)
print(df)
``````

Output:

``````   column1  column2  sum
0        1        4    5
1        2        5    7
2        3        6    9
``````

In this example, we create a DataFrame with two columns `column1` and `column2`, each containing three values. We then apply a lambda function to each row of the DataFrame using the `apply()` function. The lambda function takes a row as input and returns the sum of the two columns. We assign the result to a new column `sum`.

## Conclusion

In this article, we discussed different ways to add two columns in a Pandas DataFrame. We showed how to use the `+` operator, the `sum()` function, and the `apply()` function. While all three methods achieve the same result, they differ in terms of readability, performance, and flexibility. The `+` operator is the simplest and most intuitive method, but it may not be the most efficient for large datasets. The `sum()` function is more flexible and can handle missing values, but it requires more typing. The `apply()` function is the most flexible and can handle complex operations, but it may be slower than the other methods. As a data scientist or software engineer, you should choose the method that best fits your needs and context.