# How to Count NaN and Null Values in a Pandas DataFrame

As a data scientist or software engineer, you’ve probably encountered a situation where you need to count the number of missing values in a Pandas DataFrame. Missing values can occur for a variety of reasons, such as data entry errors, system failures, or sensor malfunctions. In this article, we’ll explain how to count `NaN`

and `null`

values in a Pandas DataFrame using Python.

## What are `NaN`

and `null`

values?

`NaN`

stands for “Not a Number” and represents a missing or undefined value in a numerical dataset. `NaN`

values are often caused by mathematical operations that result in undefined or infinite values, such as dividing by zero or taking the square root of a negative number.

`null`

values, on the other hand, are used to indicate the absence of a value in a non-numerical dataset. `null`

values can occur in datasets that contain text, dates, or categorical variables.

## How to count `null`

values in a Pandas DataFrame

To count the number of `NaN`

values in a Pandas DataFrame, we can use the `isna()`

method to create a Boolean mask and then use the `sum()`

method to count the number of `True`

values. Let’s say we have a csv file named `data.csv`

as shown below:

```
A B C
0 1.0 6.0 apple
1 2.0 NaN banana
2 NaN 8.0 cherry
3 4.0 9.0 NaN
4 5.0 10.0 date
```

```
import pandas as pd
df = pd.read_csv('data.csv')
nan_count = df.isnull().sum().sum()
print('Number of NaN values:', nan_count)
```

Output:

```
Number of NaN values: 3
```

In this example, we first read a CSV file into a Pandas DataFrame using the `read_csv()`

method. We then use the `isna()`

method to create a Boolean mask that identifies all `NaN`

values in the DataFrame. Finally, we use the `sum()`

method twice to count the number of `True`

values in the Boolean mask and obtain the total number of `NaN`

values in the DataFrame.

## How to count `null`

values in a Pandas DataFrame

To count the number of `null`

values in a Pandas DataFrame, we can use the `isnull()`

method to create a Boolean mask and then use the `sum()`

method to count the number of `True`

values.

```
import pandas as pd
df = pd.read_csv('data.csv')
null_count = df.isnull().sum().sum()
print('Number of null values:', null_count)
```

Output:

```
Number of NaN values: 3
```

In this example, we follow a similar approach as before, but we use the `isnull()`

method instead of the `isna()`

method to create a Boolean mask that identifies all `null`

values in the DataFrame.

## Handling `NaN`

and `Null`

Values

Once you have identified the missing values in your DataFrame, you may want to handle them. Some common strategies include:

- Dropping rows or columns with missing values using the
`.dropna()`

function. - Filling missing values with a specific value using the
`.fillna()`

function. - Interpolating missing values using the
`.interpolate()`

function. - Using domain-specific knowledge to replace missing values.

## Conclusion

In this article, we’ve explained how to count `NaN`

and `null`

values in a Pandas DataFrame using Python. We’ve shown how to use the `isna()`

and `isnull()`

methods to create Boolean masks that identify missing values and the `sum()`

method to count the number of `True`

values in each mask. We also explored several methods to handle `NaN`

and `Null`

values.

As a data scientist or software engineer, it’s important to be able to handle missing data effectively, as missing data can affect the accuracy and reliability of our analyses and models. By using the techniques described in this article, you can gain insights into the missing data in your dataset and make informed decisions about how to handle it.

#### About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.