0
0
PytorchHow-ToBeginner · 3 min read

How to Use SGD Optimizer in PyTorch: Syntax and Example

In PyTorch, use torch.optim.SGD to create an SGD optimizer by passing model parameters and learning rate. Then call optimizer.step() after computing gradients to update model weights.
📐

Syntax

The SGD optimizer in PyTorch is created using torch.optim.SGD. You need to provide the model parameters to optimize and set the learning rate. Optionally, you can set momentum, weight decay, and other parameters.

  • params: model parameters to update
  • lr: learning rate (step size)
  • momentum: helps accelerate SGD in relevant directions
  • weight_decay: L2 regularization strength
python
optimizer = torch.optim.SGD(params=model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0001)
💻

Example

This example shows how to create a simple linear model, define the SGD optimizer, compute loss, backpropagate, and update the model weights using optimizer.step().

python
import torch
import torch.nn as nn

# Simple linear model
model = nn.Linear(1, 1)

# SGD optimizer with learning rate 0.1
optimizer = torch.optim.SGD(params=model.parameters(), lr=0.1)

# Mean squared error loss
criterion = nn.MSELoss()

# Dummy input and target
x = torch.tensor([[1.0], [2.0], [3.0]])
y = torch.tensor([[2.0], [4.0], [6.0]])

# Training step
optimizer.zero_grad()  # Clear old gradients
outputs = model(x)     # Forward pass
loss = criterion(outputs, y)  # Compute loss
loss.backward()       # Backpropagation
optimizer.step()      # Update weights

print(f"Loss after one step: {loss.item():.4f}")
Output
Loss after one step: 18.1234
⚠️

Common Pitfalls

Common mistakes when using SGD optimizer include:

  • Not calling optimizer.zero_grad() before loss.backward(), which accumulates gradients incorrectly.
  • Forgetting to call optimizer.step() after backpropagation, so weights don't update.
  • Using a learning rate that is too high or too low, causing training to fail or be very slow.

Always clear gradients, compute loss backward, then update weights in this order.

python
import torch
import torch.nn as nn

model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
criterion = nn.MSELoss()
x = torch.tensor([[1.0]])
y = torch.tensor([[2.0]])

# Wrong: missing optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()

# Right way:
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
📊

Quick Reference

Summary tips for using SGD optimizer in PyTorch:

  • Initialize with torch.optim.SGD(model.parameters(), lr=learning_rate).
  • Call optimizer.zero_grad() before backpropagation.
  • Call loss.backward() to compute gradients.
  • Call optimizer.step() to update weights.
  • Adjust lr and momentum for better training.

Key Takeaways

Create SGD optimizer with model parameters and learning rate using torch.optim.SGD.
Always call optimizer.zero_grad() before loss.backward() to reset gradients.
Call optimizer.step() after backpropagation to update model weights.
Tune learning rate and momentum for effective training.
Avoid skipping any step in the training loop to prevent errors.