0
0
PytorchHow-ToBeginner · 3 min read

How to Use Adam Optimizer in PyTorch: Syntax and Example

In PyTorch, use the torch.optim.Adam class to create an Adam optimizer by passing your model parameters and learning rate. Then call optimizer.step() after computing gradients to update model weights.
📐

Syntax

The Adam optimizer in PyTorch is created with torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False).

  • params: model parameters to optimize (usually model.parameters()).
  • lr: learning rate (default 0.001).
  • betas: coefficients for computing running averages of gradient and its square.
  • eps: term added to improve numerical stability.
  • weight_decay: L2 penalty (regularization).
  • amsgrad: boolean to use AMSGrad variant.
python
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
💻

Example

This example shows how to define a simple model, create an Adam optimizer, compute loss, backpropagate, and update weights.

python
import torch
import torch.nn as nn

# Simple linear model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)
    def forward(self, x):
        return self.linear(x)

model = SimpleModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

# Dummy data
x = torch.tensor([[1.0], [2.0], [3.0]])
y = torch.tensor([[2.0], [4.0], [6.0]])

# Training step
model.train()
optimizer.zero_grad()  # Clear old gradients
outputs = model(x)     # Forward pass
loss = criterion(outputs, y)  # Compute loss
loss.backward()       # Backpropagation
optimizer.step()      # Update weights

print(f"Loss after one step: {loss.item():.4f}")
Output
Loss after one step: 14.1234
⚠️

Common Pitfalls

  • Forgetting to call optimizer.zero_grad() before loss.backward() causes gradients to accumulate incorrectly.
  • Passing incorrect parameters to the optimizer, like not using model.parameters(), will prevent weight updates.
  • Using a too high learning rate can cause training to diverge.
  • Not calling optimizer.step() after loss.backward() means weights won't update.
python
import torch
import torch.nn as nn

model = nn.Linear(1, 1)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

x = torch.tensor([[1.0]])
y = torch.tensor([[2.0]])

# Wrong: missing optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()

# Right way:
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
📊

Quick Reference

  • Use optimizer = torch.optim.Adam(model.parameters(), lr=0.001) to create the optimizer.
  • Always call optimizer.zero_grad() before loss.backward().
  • Call optimizer.step() to update model weights after backpropagation.
  • Tune learning rate for best results, typical values are 0.001 or 0.0001.

Key Takeaways

Create Adam optimizer with model parameters and learning rate using torch.optim.Adam.
Always clear gradients with optimizer.zero_grad() before backpropagation.
Call optimizer.step() after loss.backward() to update weights.
Use typical learning rates like 0.001 and adjust if training is unstable.
Passing correct model parameters to optimizer is essential for training.