0
0
PyTorchml~20 mins

Learning rate differential in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Learning Rate Differential Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use different learning rates for different layers?

In training deep neural networks, why might we assign different learning rates to different layers?

ABecause different learning rates help the optimizer skip some layers during training.
BBecause early layers often learn general features and require smaller learning rates, while later layers learn task-specific features and can use larger learning rates.
CBecause using the same learning rate for all layers always causes the model to diverge.
DBecause some layers have more parameters and need slower updates to avoid overfitting.
Attempts:
2 left
💡 Hint

Think about how early layers and later layers in a neural network behave differently during training.

Predict Output
intermediate
2:00remaining
Output of learning rate differential setup in PyTorch

What will be the learning rate of the parameters in model.layer1 and model.layer2 after this code runs?

PyTorch
import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(10, 5)
        self.layer2 = nn.Linear(5, 2)

model = SimpleModel()
optimizer = torch.optim.SGD([
    {'params': model.layer1.parameters(), 'lr': 0.001},
    {'params': model.layer2.parameters(), 'lr': 0.01}
], momentum=0.9)

lrs = [group['lr'] for group in optimizer.param_groups]
print(lrs)
A[0.01, 0.001]
B[0.001]
C[0.01]
D[0.001, 0.01]
Attempts:
2 left
💡 Hint

Look at how the optimizer parameter groups are defined with different learning rates.

Hyperparameter
advanced
2:00remaining
Choosing learning rates for differential training

You want to fine-tune a pretrained model by freezing early layers and training only the last few layers. Which learning rate setup is best?

ASet zero learning rate for frozen layers and a small learning rate for trainable layers.
BSet a high learning rate for all layers to speed up training.
CSet a small learning rate for frozen layers and a high learning rate for trainable layers.
DSet the same moderate learning rate for all layers regardless of freezing.
Attempts:
2 left
💡 Hint

Frozen layers should not update during training.

Metrics
advanced
2:00remaining
Effect of learning rate differential on training loss

During training with differential learning rates, you notice the loss decreases quickly at first but then plateaus. What is a likely cause?

AThe learning rate for later layers is too low, slowing learning.
BThe learning rates are perfectly balanced; plateau is normal.
CThe learning rate for early layers is too high, causing instability.
DThe optimizer momentum is set to zero, causing slow convergence.
Attempts:
2 left
💡 Hint

Consider which layers learn task-specific features and how their learning rate affects training speed.

🔧 Debug
expert
2:00remaining
Why does this differential learning rate code cause an error?

What error does this PyTorch code raise when trying to set different learning rates?

PyTorch
import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(10, 5),
    nn.ReLU(),
    nn.Linear(5, 2)
)

optimizer = torch.optim.Adam([
    {'params': model[0].parameters(), 'lr': 0.001},
    {'params': model[1].parameters(), 'lr': 0.01}
])
ANo error, code runs successfully
BRuntimeError: optimizer parameter groups must have 'params' key
CAttributeError: 'ReLU' object has no attribute 'parameters'
DTypeError: optimizer got an unexpected keyword argument 'lr'
Attempts:
2 left
💡 Hint

Check which layers have parameters and which do not.