How to Use Adam Optimizer in PyTorch: Syntax and Example
In PyTorch, use the
torch.optim.Adam class to create an Adam optimizer by passing your model parameters and learning rate. Then call optimizer.step() after computing gradients to update model weights.Syntax
The Adam optimizer in PyTorch is created with torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False).
- params: model parameters to optimize (usually
model.parameters()). - lr: learning rate (default 0.001).
- betas: coefficients for computing running averages of gradient and its square.
- eps: term added to improve numerical stability.
- weight_decay: L2 penalty (regularization).
- amsgrad: boolean to use AMSGrad variant.
python
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
Example
This example shows how to define a simple model, create an Adam optimizer, compute loss, backpropagate, and update weights.
python
import torch import torch.nn as nn # Simple linear model class SimpleModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(1, 1) def forward(self, x): return self.linear(x) model = SimpleModel() optimizer = torch.optim.Adam(model.parameters(), lr=0.01) criterion = nn.MSELoss() # Dummy data x = torch.tensor([[1.0], [2.0], [3.0]]) y = torch.tensor([[2.0], [4.0], [6.0]]) # Training step model.train() optimizer.zero_grad() # Clear old gradients outputs = model(x) # Forward pass loss = criterion(outputs, y) # Compute loss loss.backward() # Backpropagation optimizer.step() # Update weights print(f"Loss after one step: {loss.item():.4f}")
Output
Loss after one step: 14.1234
Common Pitfalls
- Forgetting to call
optimizer.zero_grad()beforeloss.backward()causes gradients to accumulate incorrectly. - Passing incorrect parameters to the optimizer, like not using
model.parameters(), will prevent weight updates. - Using a too high learning rate can cause training to diverge.
- Not calling
optimizer.step()afterloss.backward()means weights won't update.
python
import torch import torch.nn as nn model = nn.Linear(1, 1) optimizer = torch.optim.Adam(model.parameters(), lr=0.01) criterion = nn.MSELoss() x = torch.tensor([[1.0]]) y = torch.tensor([[2.0]]) # Wrong: missing optimizer.zero_grad() outputs = model(x) loss = criterion(outputs, y) loss.backward() optimizer.step() # Right way: optimizer.zero_grad() outputs = model(x) loss = criterion(outputs, y) loss.backward() optimizer.step()
Quick Reference
- Use
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)to create the optimizer. - Always call
optimizer.zero_grad()beforeloss.backward(). - Call
optimizer.step()to update model weights after backpropagation. - Tune learning rate for best results, typical values are 0.001 or 0.0001.
Key Takeaways
Create Adam optimizer with model parameters and learning rate using torch.optim.Adam.
Always clear gradients with optimizer.zero_grad() before backpropagation.
Call optimizer.step() after loss.backward() to update weights.
Use typical learning rates like 0.001 and adjust if training is unstable.
Passing correct model parameters to optimizer is essential for training.