Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a learning rate in machine learning?
The learning rate is a small number that controls how much the model changes its knowledge each time it learns from data. Think of it like the speed of learning.
Click to reveal answer
beginner
What does 'learning rate differential' mean?
Learning rate differential means using different learning rates for different parts of a model. Some parts learn faster, others slower, like giving more attention to some tasks.
Click to reveal answer
intermediate
Why use different learning rates for different layers in a neural network?
Because some layers may need bigger changes to learn new things, while others need smaller changes to keep what they already know. This helps the model learn better and faster.
Click to reveal answer
intermediate
How do you set different learning rates in PyTorch?
You can pass a list of dictionaries to the optimizer, each with a 'params' key for the model part and a 'lr' key for its learning rate. This tells PyTorch to update each part with its own speed.
Click to reveal answer
intermediate
What is a practical example of learning rate differential?
When fine-tuning a pre-trained model, you might use a small learning rate for the old layers to keep their knowledge, and a bigger learning rate for new layers to learn fast.
Click to reveal answer
What does a higher learning rate do?
AMakes the model learn faster but risks missing details
BMakes the model learn slower and more carefully
CStops the model from learning
DHas no effect on learning
✗ Incorrect
A higher learning rate means bigger steps in learning, which can speed up training but might skip over important details.
In PyTorch, how do you apply different learning rates to different layers?
ABy setting a global learning rate only
BBy passing a list of parameter groups with different 'lr' values to the optimizer
CBy using different optimizers for each layer
DBy changing the learning rate after each epoch manually
✗ Incorrect
PyTorch allows setting different learning rates by passing parameter groups with their own 'lr' values to the optimizer.
Why might you want a smaller learning rate for pre-trained layers?
ATo stop them from updating completely
BTo make them learn faster
CTo keep their learned knowledge stable
DTo reset their weights
✗ Incorrect
A smaller learning rate helps preserve the useful knowledge already learned in pre-trained layers.
What is a risk of using too large a learning rate?
AModel might not learn well and jump around
BModel will learn perfectly
CTraining will be very slow
DModel will ignore the data
✗ Incorrect
Too large a learning rate can cause the model to miss the best solution by jumping too much.
Learning rate differential is especially useful in which scenario?
AWhen not using an optimizer
BTraining a model from scratch with one layer
CUsing a fixed learning rate for all layers
DFine-tuning a pre-trained model
✗ Incorrect
Fine-tuning benefits from different learning rates to adjust new and old layers properly.
Explain what learning rate differential is and why it helps in training neural networks.
Think about how some parts of the model might need to learn slower or faster.
You got /3 concepts.
Describe how to implement learning rate differential in PyTorch with code.
Focus on how the optimizer receives different learning rates.
You got /3 concepts.
Practice
(1/5)
1. What does learning rate differential mean in PyTorch training?
easy
A. Changing the learning rate randomly during training
B. Setting different learning rates for different parts of a model
C. Using the same learning rate for the entire model
D. Freezing all model layers during training
Solution
Step 1: Understand learning rate concept
The learning rate controls how fast a model updates its knowledge during training.
Step 2: Define learning rate differential
Learning rate differential means assigning different learning rates to different parts of the model to control their update speed.
Final Answer:
Setting different learning rates for different parts of a model -> Option B
Quick Check:
Learning rate differential = Different rates per model part [OK]
Hint: Different parts can learn at different speeds [OK]
Common Mistakes:
Thinking learning rate is always the same for all layers
Confusing learning rate differential with random rate changes
Believing freezing layers means changing learning rate
2. Which PyTorch code snippet correctly sets different learning rates for two parameter groups?
easy
A. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, lr2=0.001)
B. optimizer = torch.optim.SGD(model.parameters(), lr=[0.01, 0.001])
D. optimizer = torch.optim.SGD([model.layer1, model.layer2], lr=0.01)
Solution
Step 1: Check PyTorch optimizer syntax for param groups
PyTorch allows passing a list of dicts with 'params' and 'lr' keys to set different learning rates.
Step 2: Identify correct syntax
optimizer = torch.optim.SGD([{'params': model.layer1.parameters(), 'lr': 0.01}, {'params': model.layer2.parameters(), 'lr': 0.001}], momentum=0.9) correctly uses a list of dicts with separate learning rates for layer1 and layer2 parameters.
A. Missing learning rate for second param group causes error
B. Using lr=0.001 outside param groups is invalid
C. Parameters should be passed as model.layer1, not model.layer1.parameters()
D. SGD optimizer does not support param groups
Solution
Step 1: Review param groups and learning rates
First param group has lr=0.01, second param group has no lr specified.
Step 2: Understand default lr behavior
When param groups are used, each group should have lr or optimizer's lr applies. Here, lr=0.001 is passed but second group lacks explicit lr, causing confusion.
Final Answer:
Missing learning rate for second param group causes error -> Option A
Quick Check:
All param groups need lr or default applies [OK]
Hint: Each param group must have lr or rely on optimizer lr [OK]
Common Mistakes:
Assuming optimizer lr applies to all param groups automatically
Passing parameters instead of parameter iterators
Believing SGD can't use param groups
5. You want to fine-tune a pretrained model by training only the last layer fast and freezing the rest. Which learning rate setup is best?
hard
A. Set same lr=0.01 for all layers
B. Freeze last layer and train others with lr=0.01
C. Set lr=0.01 for all layers except last layer with lr=0
D. Set lr=0 for all layers except last layer with lr=0.01
Solution
Step 1: Understand freezing and learning rate
Freezing means no updates, which can be done by setting lr=0 or disabling gradients.
Step 2: Apply learning rate differential for fine-tuning
Set lr=0 for frozen layers to prevent updates, and higher lr for last layer to train it fast.
Final Answer:
Set lr=0 for all layers except last layer with lr=0.01 -> Option D
Quick Check:
Freeze layers = lr 0, train last layer fast [OK]
Hint: Freeze layers by lr=0, train last layer with higher lr [OK]
Common Mistakes:
Using same learning rate for all layers when freezing