How to Count the Number of MissingNaN Values in Each Row in Python Pandas

In this blog, we’ll discuss the common challenge of dealing with missing values or NaNs in data analysis for data scientists and software engineers. It is crucial to efficiently identify and address missing values when working with extensive datasets. The focus of this blog post is on demonstrating how to tally the occurrences of missing or NaN values in each row of a pandas DataFrame using Python.

As a data scientist or software engineer, you know that missing values or NaNs can be a common issue in data analysis. When working with large datasets, it’s essential to have a way to quickly identify and handle missing values. In this blog post, we’ll explore how to count the number of missing/NaN values in each row of a pandas DataFrame using Python.

What are Missing/NaN Values?

Missing values or NaNs (Not a Number) are values that are not available or undefined. In pandas, missing values are usually represented by the NaN keyword. NaN values can occur due to a variety of reasons, including data entry errors, missing data, or data corruption.

Why Count Missing/NaN Values in Each Row?

Counting the number of missing/NaN values in each row is an important step in data cleaning and preprocessing. This information can help you identify rows with missing data, which can then be handled in a variety of ways, such as removing the row, filling in the missing data, or imputing the missing values.

The Solution: Counting Missing/NaN Values in Each Row

Method 1: Using isnan() and sum(axis=1)

To count the number of missing/NaN values in each row, we can use the pandas isna() method to create a Boolean mask of the DataFrame, where True indicates a missing value. We can then use the sum() method to count the number of True values in each row.

Here’s what the code looks like:

import pandas as pd

# create a sample DataFrame
df = pd.DataFrame({
    'col1': [1, 2, np.nan, 4, 5],
    'col2': [np.nan, 7, 8, 9, 10],
    'col3': [11, 12, 13, np.nan, 15]
})

# count the number of missing/NaN values in each row
row_nan_count = df.isna().sum(axis=1)

print(row_nan_count)

In the code above, we first create a sample DataFrame with three columns (col1, col2, and col3). We then use the isna() method to create a Boolean mask of the DataFrame, where True indicates a missing value. We use the sum() method with the axis=1 parameter to count the number of True values in each row. Finally, we print the resulting row-wise count of missing/NaN values.

Method 2: Using isnull() and sum(axis=1)

Similar to the first method, you can use the isnull() method instead of isna() to create a Boolean mask and then apply the sum(axis=1) method to count the missing/NaN values in each row.

row_nan_count_method2 = df.isnull().sum(axis=1)
print(row_nan_count_method2)

Method 3: Using apply() with lambda function

Another approach is to use the apply() method along with a lambda function to count missing/NaN values for each row.

row_nan_count_method3 = df.apply(lambda x: x.isna().sum(), axis=1)
print(row_nan_count_method3)

Output:

0    1
1    0
2    1
3    1
4    0
dtype: int64

Conclusion

In this blog post, we explored how to count the number of missing/NaN values in each row of a pandas DataFrame using Python. Counting missing values is an essential step in data cleaning and preprocessing, and now you have a simple solution to do so in Python.

Remember, identifying and handling missing values is crucial for accurate data analysis, so be sure to make use of this method in your next data science project.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.