Complete the code to set the learning rate for the optimizer.
optimizer = torch.optim.SGD(model.parameters(), lr=[1])The learning rate is set to 0.01, which is a common small value to start training.
Complete the code to create two parameter groups with different learning rates.
optimizer = torch.optim.Adam([
{'params': model.base.parameters(), 'lr': [1],
{'params': model.head.parameters(), 'lr': 0.01}
])The base model uses a smaller learning rate 0.001, while the head uses 0.01 for faster learning.
Fix the error in the learning rate scheduler step call.
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) for epoch in range(20): train() validate() [1]
The scheduler.step() method is called without arguments to update the learning rate after each epoch.
Fill both blanks to create a dictionary of learning rates for different layers.
lr_dict = {
'base': [1],
'head': [2]
}The base layer uses 0.001 and the head uses 0.01 as learning rates for differential training.
Fill all three blanks to define optimizer with differential learning rates and weight decay.
optimizer = torch.optim.Adam([
{'params': model.backbone.parameters(), 'lr': [1], 'weight_decay': [2],
{'params': model.classifier.parameters(), 'lr': [3], 'weight_decay': 0.01}
])The backbone uses learning rate 0.0001 with weight decay 0.001, classifier uses 0.0003 learning rate and 0.01 weight decay.