# Efficient Techniques for Summing Row Values in Pandas Dataframes

## What is pandas?

Pandas is a popular open-source library for data manipulation and analysis in Python. It provides high-performance, easy-to-use data structures, and data analysis tools. Pandas dataframes are a two-dimensional, size-mutable, tabular data structure with columns of potentially different types.

## The problem

Suppose you have a pandas dataframe with a large number of rows and columns, and you need to calculate the sum of values in a row. You might be tempted to use a for loop to iterate through each row and sum the values. However, this can be slow and inefficient, especially for large datasets.

## The solution

The most efficient way to sum values of a row of a pandas dataframe is to use the `sum()`

method with the `axis`

parameter set to 1. The `axis`

parameter specifies whether to sum the rows (0) or the columns (1). Setting `axis=1`

will sum the values in each row.

Here is an example:

```
import pandas as pd
# Create a sample dataframe
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
# Sum values of first row
sum_row = df.iloc[0].sum(axis=0)
# Print result
print("Sum of values in first row: ", sum_row)
```

Output:

```
Sum of values in first row: 12
```

In this example, we created a sample dataframe with three columns and three rows. We then used the `iloc`

method to select the first row (`df.iloc[0]`

) and applied the `sum()`

method with `axis=0`

to sum the values in the row. The resulting sum is 12.

By using the `sum()`

method with `axis=1`

, we can efficiently sum the values in each row of the dataframe. Here is an example:

```
import pandas as pd
# Create a sample dataframe
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
# Sum values of each row
sum_rows = df.sum(axis=1)
# Print result
print("Sum of values in each row: ", sum_rows)
```

Output:

```
Sum of values in each row: 0 12
1 15
2 18
dtype: int64
```

In this example, we used the `sum()`

method with `axis=1`

to sum the values in each row of the dataframe. The resulting sums are 12, 15, and 18.

## Performance comparison

Let’s compare the performance of using a for loop versus using the `sum()`

method with `axis=1`

. We will create a large dataframe with 10,000 rows and 10 columns and time each method.

```
import pandas as pd
import numpy as np
import time
# Create a large dataframe
data = np.random.randint(0, 100, size=(10000, 10))
df = pd.DataFrame(data)
# Sum values of each row using for loop
start_time = time.time()
row_sums = []
for i in range(len(df)):
row_sums.append(df.iloc[i].sum())
end_time = time.time()
print("Time taken using for loop: ", end_time - start_time)
# Sum values of each row using sum() method
start_time = time.time()
row_sums = df.sum(axis=1)
end_time = time.time()
print("Time taken using sum() method: ", end_time - start_time)
```

Output:

```
Time taken using for loop: 2.891050338745117
Time taken using sum() method: 0.0005729198455810547
```

As you can see, using the `sum()`

method with `axis=1`

is much faster than using a for loop. For a dataframe with 10,000 rows and 10 columns, the `sum()`

method took only 0.0006 seconds, while the for loop took 2.89 seconds.

## Conclusion

In this article, we explored how to efficiently sum values of a row of a pandas dataframe. We learned that the `sum()`

method with the `axis`

parameter set to 1 is the most efficient way to do this. We also compared the performance of using a for loop versus using the `sum()`

method and found that the `sum()`

method is much faster.

By using this technique, you can efficiently manipulate large datasets and save time in your data analysis and machine learning projects.

#### About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.