Python Pandas: How to remove nan and inf values

As a data scientist or software engineer, you know that working with data can be challenging, especially when dealing with missing or invalid values. In this post, I’ll show you how to use Python pandas to remove NaN and -inf values from your data.
What are NaN and -inf values?
NaN stands for Not a Number and is a special floating-point value used to represent missing or undefined values. NaN values can occur when performing mathematical operations on invalid values, such as dividing by zero or taking the square root of a negative number.
-Inf stands for negative infinity and is another special floating-point value used to represent values that are too small to be represented by a finite number. -Inf values can occur when performing mathematical operations on extremely small values.
Why remove NaN and -inf values?
NaN and -inf values can cause problems when performing calculations or statistical analysis on your data. They can skew your results, produce incorrect values, or cause errors in your code. Therefore, it’s often necessary to remove them before proceeding with your analysis.
How to remove NaN and -inf values in Python pandas
Python pandas provides several methods for removing NaN and -inf values from your data. The most commonly used methods are:
dropna(): removes rows or columns withNaNor-infvaluesreplace(): replacesNaNand-infvalues with a specified valueinterpolate(): fillsNaNvalues with interpolated values
Using dropna()
The dropna() method removes rows or columns with NaN or -inf values from your data. By default, it removes all rows with at least one NaN or -inf value. You can specify the axis parameter to remove columns instead of rows.
import pandas as pd
# create a dataframe that contains NaN values
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [6, -7, 8, -9, 10],
'C': [11, 12, 13, None, 15],
'D': [16, 17, 18, 19, 20]
})
print(df)
Output:
A B C D
0 1 6 11.0 16
1 2 -7 12.0 17
2 3 8 13.0 18
3 4 -9 NaN 19
4 5 10 15.0 20
# drop rows that contain NaN values
df = df.dropna()
print(df)
Output:
A B C D
0 1 6 11.0 16
1 2 -7 12.0 17
2 3 8 13.0 18
4 5 10 15.0 20
In this example, the dropna() method removes the fourth row from the DataFrame, which contains a None value in column C.
Using replace()
The replace() method replaces NaN and -inf values with a specified value. You can specify the value to replace NaN and -inf with using the value parameter.
import pandas as pd
import numpy as np
# create a dataframe that contains NaN and -inf values
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [6, -7, 8, -9, 10],
'C': [11, 12, 13, np.nan, 15],
'D': [16, 17, -np.inf, 19, 20]
})
print(df)
Output:
A B C D
0 1 6 11.0 16.0
1 2 -7 12.0 17.0
2 3 8 13.0 -inf
3 4 -9 NaN 19.0
4 5 10 15.0 20.0
# replace NaN and -inf values with 0
df = df.replace([np.nan, -np.inf], 0)
print(df)
Output:
A B C D
0 1 6 11.0 16.0
1 2 -7 12.0 17.0
2 3 8 13.0 0.0
3 4 -9 0.0 19.0
4 5 10 15.0 20.0
In this example, the replace() method replaces all NaN and -inf values with 0.
Using interpolate()
The interpolate() method fills NaN values with interpolated values based on the values of neighboring rows or columns. You can specify the interpolation method to use using the method parameter.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [6, -7, 8, -9, 10],
'C': [11, 12, 13, np.nan, 15],
'D': [16, 17, 18, 19, 20]
})
# Interpolate a value to replace NaN based on its neighbors
df = df.interpolate(method='linear')
Output:
A B C D
0 1 6 11.0 16
1 2 -7 12.0 17
2 3 8 13.0 18
3 4 -9 14.0 19
4 5 10 15.0 20
In this example, the interpolate() method fills the NaN value in column C with an interpolated value (14) based on the values of neighboring rows.
Conclusion
NaN and -inf values can cause problems when working with data, but Python pandas provides several methods for removing or replacing them. By using the dropna(), replace(), and interpolate() methods, you can clean your data and proceed with your analysis without worrying about invalid values.
Remember to always carefully consider the impact of removing or replacing NaN and -inf values on your analysis and to document your data cleaning process.
About Saturn Cloud
Saturn Cloud is a portable AI platform that installs securely in any cloud account. Build, deploy, scale and collaborate on AI/ML workloads-no long term contracts, no vendor lock-in.
Saturn Cloud provides customizable, ready-to-use cloud environments
for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools.