How to Count NaN and Null Values in a Pandas DataFrame

In this blog, learn how to efficiently count missing values in a Pandas DataFrame using Python. Discover essential techniques for identifying and managing NaN and null values, essential for data scientists and software engineers dealing with data quality issues.

By Saturn Cloud | Monday, June 19, 2023 | Miscellaneous | Updated: Friday, October 27, 2023

As a data scientist or software engineer, you’ve probably encountered a situation where you need to count the number of missing values in a Pandas DataFrame. Missing values can occur for a variety of reasons, such as data entry errors, system failures, or sensor malfunctions. In this article, we’ll explain how to count NaN and null values in a Pandas DataFrame using Python.

What are `NaN` and `null` values?

NaN stands for “Not a Number” and represents a missing or undefined value in a numerical dataset. NaN values are often caused by mathematical operations that result in undefined or infinite values, such as dividing by zero or taking the square root of a negative number.

null values, on the other hand, are used to indicate the absence of a value in a non-numerical dataset. null values can occur in datasets that contain text, dates, or categorical variables.

How to count `null` values in a Pandas DataFrame

To count the number of NaN values in a Pandas DataFrame, we can use the isna() method to create a Boolean mask and then use the sum() method to count the number of True values. Let’s say we have a csv file named data.csv as shown below:

     A     B       C
0  1.0   6.0   apple
1  2.0   NaN  banana
2  NaN   8.0  cherry
3  4.0   9.0     NaN
4  5.0  10.0    date

import pandas as pd

df = pd.read_csv('data.csv')
nan_count = df.isnull().sum().sum()
print('Number of NaN values:', nan_count)

Output:

Number of NaN values: 3

In this example, we first read a CSV file into a Pandas DataFrame using the read_csv() method. We then use the isna() method to create a Boolean mask that identifies all NaN values in the DataFrame. Finally, we use the sum() method twice to count the number of True values in the Boolean mask and obtain the total number of NaN values in the DataFrame.

How to count `null` values in a Pandas DataFrame

To count the number of null values in a Pandas DataFrame, we can use the isnull() method to create a Boolean mask and then use the sum() method to count the number of True values.

import pandas as pd

df = pd.read_csv('data.csv')
null_count = df.isnull().sum().sum()
print('Number of null values:', null_count)

Output:

Number of NaN values: 3

In this example, we follow a similar approach as before, but we use the isnull() method instead of the isna() method to create a Boolean mask that identifies all null values in the DataFrame.

Handling `NaN` and `Null` Values

Once you have identified the missing values in your DataFrame, you may want to handle them. Some common strategies include:

Dropping rows or columns with missing values using the .dropna() function.
Filling missing values with a specific value using the .fillna() function.
Interpolating missing values using the .interpolate() function.
Using domain-specific knowledge to replace missing values.

Conclusion

In this article, we’ve explained how to count NaN and null values in a Pandas DataFrame using Python. We’ve shown how to use the isna() and isnull() methods to create Boolean masks that identify missing values and the sum() method to count the number of True values in each mask. We also explored several methods to handle NaN and Null values.

As a data scientist or software engineer, it’s important to be able to handle missing data effectively, as missing data can affect the accuracy and reliability of our analyses and models. By using the techniques described in this article, you can gain insights into the missing data in your dataset and make informed decisions about how to handle it.

About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.

Get a Technical Demo

How to Count NaN and Null Values in a Pandas DataFrame

What are NaN and null values?

How to count null values in a Pandas DataFrame

How to count null values in a Pandas DataFrame

Handling NaN and Null Values

Conclusion

SHARE:

About Saturn Cloud

Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.

What are `NaN` and `null` values?

How to count `null` values in a Pandas DataFrame

How to count `null` values in a Pandas DataFrame

Handling `NaN` and `Null` Values