How to Count NaN Values in a Pandas DataFrame Column
If you’re a data scientist or software engineer working with data using Python, you’ve likely encountered missing data, or NaN (Not a Number) values in your datasets. NaN values can arise due to various reasons such as incomplete data, data entry errors, or data corruption. It’s crucial to identify and handle these missing values correctly to avoid incorrect analysis results. In this guide, we’ll explore how to count NaN values in a Pandas DataFrame column, a popular data manipulation library in Python.
Prerequisites
Before we dive into the code, let’s ensure that we have the following prerequisites:
- Python 3.x
- Pandas library installed (You can install it using
pip install pandas
)
Importing Libraries
First, we need to import the necessary libraries for our task. We’ll be using the Pandas library to create and manipulate data frames.
import pandas as pd
Creating a DataFrame
Let’s create our sample data frame to work with using the pd.DataFrame()
function.
# Creating a data frame with missing values
data = {
'Name': ['John', 'Doe', 'Alice', 'Bob', 'Chris'],
'Age': [25, 30, 22, 28, 35],
'Salary': [50000, 60000, None, 75000, 90000],
'Experience': [2, 5, 1, None, 10]
}
df = pd.DataFrame(data)
Our data frame looks like this:
Name | Age | Salary | Experience | |
---|---|---|---|---|
0 | John | 25 | 50000.0 | 2.0 |
1 | Doe | 30 | 60000.0 | 5.0 |
2 | Alice | 22 | NaN | 1.0 |
3 | Bob | 28 | 75000.0 | NaN |
4 | Chris | 35 | 90000.0 | 10.0 |
Our data frame consists of four columns, with two columns containing NaN values. We’ll now explore how to count the NaN values in specific columns.
Counting NaN values in a column
To count the number of NaN values in a specific column in a Pandas DataFrame, we can use the isna()
and sum()
functions. The isna()
function returns a Boolean value of True
if the value is NaN and False
otherwise. The sum()
function returns the sum of True values, which equals the number of NaN values in the column.
# Counting NaN values in the 'Salary' column
salary_nan_count = df['Salary'].isna().sum()
print(salary_nan_count)
The output will be 1
, which is the number of NaN values in the ‘Salary’ column.
Similarly, we can count the number of NaN values in the ‘Experience’ column.
# Counting NaN values in the 'Experience' column
exp_nan_count = df['Experience'].isna().sum()
print(exp_nan_count)
The output will be 1
, which is the number of NaN values in the ‘Experience’ column.
Counting NaN values in all columns
If you want to count the number of NaN values in all columns of the data frame, you can use the isna()
and sum()
functions without specifying a column.
# Counting NaN values in all columns
nan_count = df.isna().sum()
print(nan_count)
The output will be:
Name 0
Age 0
Salary 1
Experience 1
dtype: int64
The output shows the number of NaN values in each column of the data frame.
In addition to utilizing isna
, you can also employ isnull
, and the code remains entirely analogous to what we’ve done earlier.
Conclusion
In this guide, we’ve explored how to count NaN values in a Pandas DataFrame column using the isna()
(isnull()
) and sum()
functions. We’ve also shown how to count NaN values in all columns of a data frame. Identifying and handling missing data is a crucial step in data analysis, and Pandas provides a simple yet powerful way to handle NaN values in your data frames.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.