What is Optimizers (SGD, Adam) in PyTorch?

PyTorchml~5 mins

Optimizers (SGD, Adam) in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Optimizers help a machine learning model learn by adjusting its settings to make better guesses.

When training a model to recognize images or sounds.

When you want the model to improve its predictions step by step.

When you need to find the best settings for your model automatically.

When comparing different ways to teach a model to see which works faster.

When fine-tuning a model to get more accurate results.

Syntax

PyTorch

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# or
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

SGD stands for Stochastic Gradient Descent, a simple way to update model settings.

Adam is a smarter optimizer that adapts learning rates for each setting automatically.

Examples

Creates an SGD optimizer with a learning rate of 0.1.

PyTorch

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

Creates an Adam optimizer with a learning rate of 0.001.

PyTorch

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

SGD optimizer with momentum to help speed up learning.

PyTorch

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

Sample Model

This code trains a simple model to learn the rule y = 2x + 1 using SGD optimizer. It prints loss each step and shows prediction for input 5.

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Simple model: one linear layer
model = nn.Linear(1, 1)

# Data: y = 2x + 1
x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])

# Choose optimizer: SGD or Adam
optimizer = optim.SGD(model.parameters(), lr=0.1)
# optimizer = optim.Adam(model.parameters(), lr=0.1)

# Loss function
criterion = nn.MSELoss()

# Training loop
for epoch in range(10):
    optimizer.zero_grad()  # Clear old gradients
    outputs = model(x)     # Predict
    loss = criterion(outputs, y)  # Calculate loss
    loss.backward()       # Calculate gradients
    optimizer.step()      # Update model
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# Final prediction
test_input = torch.tensor([[5.0]])
prediction = model(test_input).item()
print(f"Prediction for input 5: {prediction:.2f}")

OutputSuccess

Important Notes

SGD is simple and works well for many problems but may need tuning of learning rate.

Adam often works better without much tuning because it adjusts learning rates automatically.

Always clear gradients with optimizer.zero_grad() before backpropagation.

Summary

Optimizers help models learn by updating their settings to reduce mistakes.

SGD is a basic optimizer; Adam is more advanced and adapts learning rates.

Choosing the right optimizer and learning rate affects how fast and well your model learns.