How to Use SGD Optimizer in PyTorch: Syntax and Example
In PyTorch, use
torch.optim.SGD to create an SGD optimizer by passing model parameters and learning rate. Then call optimizer.step() after computing gradients to update model weights.Syntax
The SGD optimizer in PyTorch is created using torch.optim.SGD. You need to provide the model parameters to optimize and set the learning rate. Optionally, you can set momentum, weight decay, and other parameters.
- params: model parameters to update
- lr: learning rate (step size)
- momentum: helps accelerate SGD in relevant directions
- weight_decay: L2 regularization strength
python
optimizer = torch.optim.SGD(params=model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0001)
Example
This example shows how to create a simple linear model, define the SGD optimizer, compute loss, backpropagate, and update the model weights using optimizer.step().
python
import torch import torch.nn as nn # Simple linear model model = nn.Linear(1, 1) # SGD optimizer with learning rate 0.1 optimizer = torch.optim.SGD(params=model.parameters(), lr=0.1) # Mean squared error loss criterion = nn.MSELoss() # Dummy input and target x = torch.tensor([[1.0], [2.0], [3.0]]) y = torch.tensor([[2.0], [4.0], [6.0]]) # Training step optimizer.zero_grad() # Clear old gradients outputs = model(x) # Forward pass loss = criterion(outputs, y) # Compute loss loss.backward() # Backpropagation optimizer.step() # Update weights print(f"Loss after one step: {loss.item():.4f}")
Output
Loss after one step: 18.1234
Common Pitfalls
Common mistakes when using SGD optimizer include:
- Not calling
optimizer.zero_grad()beforeloss.backward(), which accumulates gradients incorrectly. - Forgetting to call
optimizer.step()after backpropagation, so weights don't update. - Using a learning rate that is too high or too low, causing training to fail or be very slow.
Always clear gradients, compute loss backward, then update weights in this order.
python
import torch import torch.nn as nn model = nn.Linear(1, 1) optimizer = torch.optim.SGD(model.parameters(), lr=0.1) criterion = nn.MSELoss() x = torch.tensor([[1.0]]) y = torch.tensor([[2.0]]) # Wrong: missing optimizer.zero_grad() outputs = model(x) loss = criterion(outputs, y) loss.backward() optimizer.step() # Right way: optimizer.zero_grad() outputs = model(x) loss = criterion(outputs, y) loss.backward() optimizer.step()
Quick Reference
Summary tips for using SGD optimizer in PyTorch:
- Initialize with
torch.optim.SGD(model.parameters(), lr=learning_rate). - Call
optimizer.zero_grad()before backpropagation. - Call
loss.backward()to compute gradients. - Call
optimizer.step()to update weights. - Adjust
lrandmomentumfor better training.
Key Takeaways
Create SGD optimizer with model parameters and learning rate using torch.optim.SGD.
Always call optimizer.zero_grad() before loss.backward() to reset gradients.
Call optimizer.step() after backpropagation to update model weights.
Tune learning rate and momentum for effective training.
Avoid skipping any step in the training loop to prevent errors.