How to Count NaN and Null Values in a Pandas DataFrame
As a data scientist or software engineer, you’ve probably encountered a situation where you need to count the number of missing values in a Pandas DataFrame. Missing values can occur for a variety of reasons, such as data entry errors, system failures, or sensor malfunctions. In this article, we’ll explain how to count NaN
and null
values in a Pandas DataFrame using Python.
What are NaN
and null
values?
NaN
stands for “Not a Number” and represents a missing or undefined value in a numerical dataset. NaN
values are often caused by mathematical operations that result in undefined or infinite values, such as dividing by zero or taking the square root of a negative number.
null
values, on the other hand, are used to indicate the absence of a value in a non-numerical dataset. null
values can occur in datasets that contain text, dates, or categorical variables.
How to count null
values in a Pandas DataFrame
To count the number of NaN
values in a Pandas DataFrame, we can use the isna()
method to create a Boolean mask and then use the sum()
method to count the number of True
values. Let’s say we have a csv file named data.csv
as shown below:
A B C
0 1.0 6.0 apple
1 2.0 NaN banana
2 NaN 8.0 cherry
3 4.0 9.0 NaN
4 5.0 10.0 date
import pandas as pd
df = pd.read_csv('data.csv')
nan_count = df.isnull().sum().sum()
print('Number of NaN values:', nan_count)
Output:
Number of NaN values: 3
In this example, we first read a CSV file into a Pandas DataFrame using the read_csv()
method. We then use the isna()
method to create a Boolean mask that identifies all NaN
values in the DataFrame. Finally, we use the sum()
method twice to count the number of True
values in the Boolean mask and obtain the total number of NaN
values in the DataFrame.
How to count null
values in a Pandas DataFrame
To count the number of null
values in a Pandas DataFrame, we can use the isnull()
method to create a Boolean mask and then use the sum()
method to count the number of True
values.
import pandas as pd
df = pd.read_csv('data.csv')
null_count = df.isnull().sum().sum()
print('Number of null values:', null_count)
Output:
Number of NaN values: 3
In this example, we follow a similar approach as before, but we use the isnull()
method instead of the isna()
method to create a Boolean mask that identifies all null
values in the DataFrame.
Handling NaN
and Null
Values
Once you have identified the missing values in your DataFrame, you may want to handle them. Some common strategies include:
- Dropping rows or columns with missing values using the
.dropna()
function. - Filling missing values with a specific value using the
.fillna()
function. - Interpolating missing values using the
.interpolate()
function. - Using domain-specific knowledge to replace missing values.
Conclusion
In this article, we’ve explained how to count NaN
and null
values in a Pandas DataFrame using Python. We’ve shown how to use the isna()
and isnull()
methods to create Boolean masks that identify missing values and the sum()
method to count the number of True
values in each mask. We also explored several methods to handle NaN
and Null
values.
As a data scientist or software engineer, it’s important to be able to handle missing data effectively, as missing data can affect the accuracy and reliability of our analyses and models. By using the techniques described in this article, you can gain insights into the missing data in your dataset and make informed decisions about how to handle it.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.