Regularization (L1, L2)

What is Regularization?

Regularization is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function. Overfitting occurs when a model learns to fit the training data too well, resulting in poor generalization to unseen data. Regularization helps to reduce the complexity of the model by penalizing large coefficients, which in turn improves the model’s performance on unseen data.

L1 Regularization (Lasso)

L1 regularization adds the absolute value of the model coefficients as a penalty term to the loss function. This results in some coefficients being exactly equal to zero, effectively performing feature selection by removing irrelevant features from the model.

L2 Regularization (Ridge)

L2 regularization adds the squared value of the model coefficients as a penalty term to the loss function. This results in smaller coefficients but does not force them to be exactly zero, maintaining all features in the model.

Example of Lasso and Ridge Regularization

Here’s an example using Scikit-learn to apply Lasso (L1) and Ridge (L2) regularization to a linear regression problem:

import numpy as np
from sklearn.linear_model import Lasso, Ridge

# Create synthetic data
X = np.random.randn(100, 5)
y = 3 * X[:, 0] + 2 * X[:, 1] + np.random.randn(100)

# Apply Lasso (L1) regularization
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)

# Apply Ridge (L2) regularization
ridge = Ridge(alpha=0.1)
ridge.fit(X, y)

# Compare coefficients of Lasso and Ridge regularization
print("Lasso coefficients:", lasso.coef_)
print("Ridge coefficients:", ridge.coef_)

This code would output something like:

Lasso coefficients: [ 2.95659228  1.91485552 -0.         -0.          0.        ]
Ridge coefficients: [ 2.96016895  1.92242926 -0.06866116 -0.0242192   0.04522953]

Notice that Lasso (L1) regularization makes some coefficients exactly zero, while Ridge (L2) regularization keeps all coefficients in the model, but with smaller values.

Resources

Additional Resources