Normalization in Data Preprocessing

← Back to Glossary

What is Normalization?

Normalization is a data preprocessing technique used to transform features in a dataset to a common scale, improving the performance and accuracy of machine learning algorithms. The main goal of normalization is to eliminate the potential biases and distortions caused by the different scales of features. Some common normalization methods include min-max scaling, z-score standardization, and log transformation.

Example:

In this example, we’ll demonstrate how to normalize a dataset using the min-max scaling method from the scikit-learn library.

import numpy as np
from sklearn.preprocessing import MinMaxScaler

data = np.array([[1, 200, 3000],
                 [2, 300, 4000],
                 [3, 400, 5000]])

scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)

print("Original data:")
print(data)
print("Normalized data:")
print(normalized_data)

Output:

Original data:
[[   1  200 3000]
 [   2  300 4000]
 [   3  400 5000]]
Normalized data:
[[0.  0.  0. ]
 [0.5 0.5 0.5]
 [1.  1.  1. ]]

Some resources to learn more about normalization:

Normalization in Machine Learning, a guide on the goals and methods of normalization in machine learning
What is Normalization in Machine Learning, an article explaining the rationale behind normalization and its importance in machine learning
Saturn Cloud