What is SMOTE?
SMOTE (Synthetic Minority Over-sampling Technique) is a popular oversampling technique used to balance imbalanced datasets in machine learning. SMOTE works by generating synthetic examples for the minority class to balance the class distribution. It does this by selecting instances that are close in the feature space and creating new instances by interpolating between them.
Why use SMOTE?
Imbalanced datasets can lead to biased models that perform poorly on the underrepresented class. SMOTE helps to alleviate this issue by generating synthetic instances of the minority class, thus balancing the class distribution and improving the model’s performance on the minority class.
Example of using SMOTE in Python:
Here’s a simple example of using SMOTE with the imbalanced-learn library in Python:
# Install the imbalanced-learn library !pip install -U imbalanced-learn import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from imblearn.over_sampling import SMOTE from collections import Counter # Create an imbalanced dataset X, y = make_classification(n_classes=2, class_sep=2, weights=[0.1, 0.9], n_features=20, n_samples=1000, random_state=42) print("Original dataset class distribution:", Counter(y)) # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42) # Apply SMOTE to the training data sm = SMOTE(random_state=42) X_train_resampled, y_train_resampled = sm.fit_resample(X_train, y_train) print("Resampled dataset class distribution:", Counter(y_train_resampled))
In this example, we create an imbalanced dataset, split it into training and testing sets, and apply SMOTE to the training data to balance the class distribution.