How to Count NaN Values in a Pandas DataFrame Column

For data scientists and software engineers dealing with Python, handling missing data (NaN values) is vital for accurate analysis. Learn how to count NaN values in a Pandas DataFrame column in this guide for effective data manipulation.

If you’re a data scientist or software engineer working with data using Python, you’ve likely encountered missing data, or NaN (Not a Number) values in your datasets. NaN values can arise due to various reasons such as incomplete data, data entry errors, or data corruption. It’s crucial to identify and handle these missing values correctly to avoid incorrect analysis results. In this guide, we’ll explore how to count NaN values in a Pandas DataFrame column, a popular data manipulation library in Python.

Prerequisites

Before we dive into the code, let’s ensure that we have the following prerequisites:

  • Python 3.x
  • Pandas library installed (You can install it using pip install pandas)

Importing Libraries

First, we need to import the necessary libraries for our task. We’ll be using the Pandas library to create and manipulate data frames.

import pandas as pd

Creating a DataFrame

Let’s create our sample data frame to work with using the pd.DataFrame() function.

# Creating a data frame with missing values
data = {
    'Name': ['John', 'Doe', 'Alice', 'Bob', 'Chris'],
    'Age': [25, 30, 22, 28, 35],
    'Salary': [50000, 60000, None, 75000, 90000],
    'Experience': [2, 5, 1, None, 10]
}

df = pd.DataFrame(data)

Our data frame looks like this:

NameAgeSalaryExperience
0John2550000.02.0
1Doe3060000.05.0
2Alice22NaN1.0
3Bob2875000.0NaN
4Chris3590000.010.0

Our data frame consists of four columns, with two columns containing NaN values. We’ll now explore how to count the NaN values in specific columns.

Counting NaN values in a column

To count the number of NaN values in a specific column in a Pandas DataFrame, we can use the isna() and sum() functions. The isna() function returns a Boolean value of True if the value is NaN and False otherwise. The sum() function returns the sum of True values, which equals the number of NaN values in the column.

# Counting NaN values in the 'Salary' column
salary_nan_count = df['Salary'].isna().sum()

print(salary_nan_count)

The output will be 1, which is the number of NaN values in the ‘Salary’ column.

Similarly, we can count the number of NaN values in the ‘Experience’ column.

# Counting NaN values in the 'Experience' column
exp_nan_count = df['Experience'].isna().sum()

print(exp_nan_count)

The output will be 1, which is the number of NaN values in the ‘Experience’ column.

Counting NaN values in all columns

If you want to count the number of NaN values in all columns of the data frame, you can use the isna() and sum() functions without specifying a column.

# Counting NaN values in all columns
nan_count = df.isna().sum()

print(nan_count)

The output will be:

Name          0
Age           0
Salary        1
Experience    1
dtype: int64

The output shows the number of NaN values in each column of the data frame.

In addition to utilizing isna, you can also employ isnull, and the code remains entirely analogous to what we’ve done earlier.

Conclusion

In this guide, we’ve explored how to count NaN values in a Pandas DataFrame column using the isna() (isnull()) and sum() functions. We’ve also shown how to count NaN values in all columns of a data frame. Identifying and handling missing data is a crucial step in data analysis, and Pandas provides a simple yet powerful way to handle NaN values in your data frames.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.