How to Count the Number of MissingNaN Values in Each Row in Python Pandas
As a data scientist or software engineer, you know that missing values or NaNs can be a common issue in data analysis. When working with large datasets, it’s essential to have a way to quickly identify and handle missing values. In this blog post, we’ll explore how to count the number of missing/NaN values in each row of a pandas DataFrame using Python.
What are Missing/NaN Values?
Missing values or NaNs (Not a Number) are values that are not available or undefined. In pandas, missing values are usually represented by the NaN keyword. NaN values can occur due to a variety of reasons, including data entry errors, missing data, or data corruption.
Why Count Missing/NaN Values in Each Row?
Counting the number of missing/NaN values in each row is an important step in data cleaning and preprocessing. This information can help you identify rows with missing data, which can then be handled in a variety of ways, such as removing the row, filling in the missing data, or imputing the missing values.
The Solution: Counting Missing/NaN Values in Each Row
Method 1: Using isnan()
and sum(axis=1)
To count the number of missing/NaN values in each row, we can use the pandas isna()
method to create a Boolean mask of the DataFrame, where True
indicates a missing value. We can then use the sum()
method to count the number of True
values in each row.
Here’s what the code looks like:
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({
'col1': [1, 2, np.nan, 4, 5],
'col2': [np.nan, 7, 8, 9, 10],
'col3': [11, 12, 13, np.nan, 15]
})
# count the number of missing/NaN values in each row
row_nan_count = df.isna().sum(axis=1)
print(row_nan_count)
In the code above, we first create a sample DataFrame with three columns (col1
, col2
, and col3
). We then use the isna()
method to create a Boolean mask of the DataFrame, where True
indicates a missing value. We use the sum()
method with the axis=1
parameter to count the number of True
values in each row. Finally, we print the resulting row-wise count of missing/NaN values.
Method 2: Using isnull()
and sum(axis=1)
Similar to the first method, you can use the isnull()
method instead of isna()
to create a Boolean mask and then apply the sum(axis=1)
method to count the missing/NaN values in each row.
row_nan_count_method2 = df.isnull().sum(axis=1)
print(row_nan_count_method2)
Method 3: Using apply()
with lambda
function
Another approach is to use the apply()
method along with a lambda
function to count missing/NaN values for each row.
row_nan_count_method3 = df.apply(lambda x: x.isna().sum(), axis=1)
print(row_nan_count_method3)
Output:
0 1
1 0
2 1
3 1
4 0
dtype: int64
Conclusion
In this blog post, we explored how to count the number of missing/NaN values in each row of a pandas DataFrame using Python. Counting missing values is an essential step in data cleaning and preprocessing, and now you have a simple solution to do so in Python.
Remember, identifying and handling missing values is crucial for accurate data analysis, so be sure to make use of this method in your next data science project.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.