What is Stochastic Gradient Descent?
Stochastic Gradient Descent (SGD) is an optimization algorithm used in machine learning and deep learning to minimize a loss function by iteratively updating the model’s parameters. Unlike Batch Gradient Descent, which computes the gradient using the entire dataset, SGD calculates the gradient and updates the parameters using only a single or a small subset (mini-batch) of training examples at each iteration. This approach makes the algorithm faster and more suitable for large-scale datasets.
How does Stochastic Gradient Descent work?
Stochastic Gradient Descent works by following these steps:
- Randomly shuffle the training dataset.
- For each epoch (iteration through the entire dataset), select a single or a mini-batch of training examples.
- Compute the gradient of the loss function with respect to the model parameters using the selected examples.
- Update the model parameters by subtracting the computed gradient multiplied by a learning rate.
Example of Stochastic Gradient Descent in Python
Here’s a simple example of using Stochastic Gradient Descent with scikit-learn:
from sklearn.linear_model import SGDRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # Load the Boston housing dataset boston = load_boston() X = boston.data y = boston.target # Split the data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Standardize the data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Create an SGDRegressor sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1, random_state=42) # Train the model sgd_reg.fit(X_train, y_train) # Test the model score = sgd_reg.score(X_test, y_test) print("R-squared:", score)