Stochastic Gradient Descent

What is Stochastic Gradient Descent?

Stochastic Gradient Descent (SGD) is an optimization algorithm used in machine learning and deep learning to minimize a loss function by iteratively updating the model’s parameters. Unlike Batch Gradient Descent, which computes the gradient using the entire dataset, SGD calculates the gradient and updates the parameters using only a single or a small subset (mini-batch) of training examples at each iteration. This approach makes the algorithm faster and more suitable for large-scale datasets.

How does Stochastic Gradient Descent work?

Stochastic Gradient Descent works by following these steps:

  1. Randomly shuffle the training dataset.
  2. For each epoch (iteration through the entire dataset), select a single or a mini-batch of training examples.
  3. Compute the gradient of the loss function with respect to the model parameters using the selected examples.
  4. Update the model parameters by subtracting the computed gradient multiplied by a learning rate.

Example of Stochastic Gradient Descent in Python

Here’s a simple example of using Stochastic Gradient Descent with scikit-learn:

from sklearn.linear_model import SGDRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the Boston housing dataset
boston = load_boston()
X =
y =

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create an SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1, random_state=42)

# Train the model, y_train)

# Test the model
score = sgd_reg.score(X_test, y_test)
print("R-squared:", score)

Additional resources on Stochastic Gradient Descent