Bird
Raised Fist0
PyTorchml~8 mins

Learning rate schedulers in PyTorch - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Learning rate schedulers
Which metric matters for Learning Rate Schedulers and WHY

Learning rate schedulers help control how fast a model learns during training. The key metric to watch is the training loss and validation loss. These show if the model is improving or if it is stuck. A good scheduler lowers loss smoothly without sudden jumps.

Also, watch validation accuracy to see if the model generalizes well. If accuracy stops improving or drops, the learning rate might be too high or too low.

Confusion Matrix or Equivalent Visualization

Learning rate schedulers do not directly affect confusion matrices. But you can track loss and accuracy over epochs like this:

Epoch | Training Loss | Validation Loss | Validation Accuracy
------------------------------------------------------------
  1   |     0.8      |      0.9       |       70%
  5   |     0.4      |      0.5       |       85%
 10   |     0.2      |      0.3       |       92%
 15   |     0.15     |      0.25      |       93%
    

A good scheduler shows loss going down steadily and accuracy going up.

Precision vs Recall Tradeoff (Analogy for Learning Rate)

Think of learning rate like driving speed:

  • High learning rate is like driving too fast: you might miss turns (model skips good solutions) or crash (loss jumps up).
  • Low learning rate is like driving too slow: you get there safely but take forever (training is slow, might get stuck).

Schedulers adjust speed over time: start fast to learn quickly, then slow down to fine-tune. This balance helps the model learn well without overshooting or wasting time.

What Good vs Bad Metric Values Look Like for Learning Rate Schedulers

Good:

  • Training and validation loss decrease smoothly over epochs.
  • Validation accuracy steadily increases or plateaus at a high value.
  • No sudden spikes or drops in loss or accuracy.

Bad:

  • Loss jumps up or oscillates wildly.
  • Validation accuracy drops or fluctuates a lot.
  • Training loss decreases but validation loss increases (overfitting).
Common Pitfalls with Learning Rate Schedulers
  • Too high learning rate: Causes loss to jump and training to fail.
  • Too low learning rate: Training is very slow and may get stuck in bad solutions.
  • Not adjusting learning rate: Using a fixed rate can cause slow or unstable training.
  • Ignoring validation metrics: Only watching training loss can hide overfitting.
  • Data leakage: If validation data leaks into training, metrics look better but model fails in real use.
Self-Check Question

Your model's training loss decreases steadily, but validation loss stops improving and validation accuracy plateaus early. You use a fixed learning rate. Is this good? Why or why not?

Answer: This suggests the learning rate might be too high or not adjusted. The model may be overfitting or stuck. Using a learning rate scheduler to reduce the rate over time could help improve validation performance.

Key Result
Learning rate schedulers improve training by smoothly lowering loss and increasing accuracy over time, avoiding jumps or stalls.

Practice

(1/5)
1. What is the main purpose of using a learning rate scheduler in PyTorch training?
easy
A. To change the model architecture dynamically
B. To increase the batch size automatically
C. To shuffle the training data at each epoch
D. To adjust the learning rate during training for better model performance

Solution

  1. Step 1: Understand the role of learning rate

    The learning rate controls how fast the model updates its knowledge during training.
  2. Step 2: Identify what a scheduler does

    A learning rate scheduler changes the learning rate over time to improve training stability and performance.
  3. Final Answer:

    To adjust the learning rate during training for better model performance -> Option D
  4. Quick Check:

    Learning rate scheduler adjusts learning rate [OK]
Hint: Schedulers change learning rate, not batch size or model structure [OK]
Common Mistakes:
  • Confusing scheduler with batch size adjustment
  • Thinking scheduler changes model layers
  • Assuming scheduler shuffles data
2. Which of the following is the correct way to create a StepLR scheduler in PyTorch for optimizer opt with step size 10 and gamma 0.1?
easy
A. scheduler = torch.optim.lr_scheduler.StepLR(opt, step_size=10, gamma=0.1)
B. scheduler = torch.optim.StepLR(opt, step=10, decay=0.1)
C. scheduler = torch.optim.lr_scheduler.StepLR(opt, steps=10, gamma=0.1)
D. scheduler = torch.optim.lr_scheduler.StepLR(opt, step_size=10, decay=0.1)

Solution

  1. Step 1: Recall PyTorch StepLR syntax

    The correct class is torch.optim.lr_scheduler.StepLR with parameters step_size and gamma.
  2. Step 2: Match parameters correctly

    step_size=10 and gamma=0.1 are the correct parameter names and values.
  3. Final Answer:

    scheduler = torch.optim.lr_scheduler.StepLR(opt, step_size=10, gamma=0.1) -> Option A
  4. Quick Check:

    StepLR uses step_size and gamma [OK]
Hint: Use exact parameter names: step_size and gamma [OK]
Common Mistakes:
  • Using wrong parameter names like step or decay
  • Calling StepLR from wrong module
  • Mixing up parameter order
3. Given the code below, what will be the learning rate after 3 calls to scheduler.step()?
import torch
opt = torch.optim.SGD([torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))], lr=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(opt, step_size=2, gamma=0.5)

for _ in range(3):
    scheduler.step()
    current_lr = opt.param_groups[0]['lr']
medium
A. 0.05
B. 0.1
C. 0.025
D. 0.0125

Solution

  1. Step 1: Understand StepLR behavior

    StepLR reduces learning rate by gamma every step_size epochs. Here, step_size=2, gamma=0.5.
  2. Step 2: Calculate learning rate after 3 steps

    After 1 step: lr=0.1 (no change, step 1 < 2)
    After 2 steps: lr=0.1 * 0.5 = 0.05 (step 2 reached)
    After 3 steps: lr remains 0.05 (step 3 < 4)
  3. Final Answer:

    0.05 -> Option A
  4. Quick Check:

    StepLR halves lr every 2 steps [OK]
Hint: Learning rate changes only at multiples of step_size [OK]
Common Mistakes:
  • Reducing learning rate every step instead of every step_size
  • Multiplying gamma incorrectly
  • Ignoring initial learning rate
4. Identify the error in the following PyTorch learning rate scheduler code:
import torch
opt = torch.optim.Adam([torch.nn.Parameter(torch.randn(3, 3, requires_grad=True))], lr=0.01)
scheduler = torch.optim.lr_scheduler.ExponentialLR(opt, gamma=0.9)

for epoch in range(5):
    scheduler.step()
    print(f"Epoch {epoch}: lr = {opt.param_groups[0]['lr']}")
medium
A. Learning rate should be set inside the loop
B. scheduler.step() should be called after optimizer.step()
C. ExponentialLR does not exist in PyTorch
D. gamma value must be greater than 1

Solution

  1. Step 1: Recall correct scheduler usage

    In PyTorch, scheduler.step() should be called after optimizer.step() to update learning rate correctly.
  2. Step 2: Check code order

    The code calls scheduler.step() before any optimizer.step(), which is incorrect and may cause unexpected lr updates.
  3. Final Answer:

    scheduler.step() should be called after optimizer.step() -> Option B
  4. Quick Check:

    Call scheduler.step() after optimizer.step() [OK]
Hint: Always call scheduler.step() after optimizer.step() [OK]
Common Mistakes:
  • Calling scheduler.step() before optimizer.step()
  • Using invalid gamma values
  • Misunderstanding scheduler existence
5. You want to train a model where the learning rate starts at 0.1, then reduces by half every 5 epochs, but after 20 epochs, it should decay exponentially by 0.9 every epoch. Which PyTorch scheduler setup achieves this behavior?
hard
A. Use CosineAnnealingLR with T_max=20 and then StepLR with step_size=5, gamma=0.5
B. Use ExponentialLR with gamma=0.9 from start and manually adjust learning rate at epoch 20
C. Use StepLR with step_size=5, gamma=0.5 for first 20 epochs, then switch to ExponentialLR with gamma=0.9
D. Use StepLR with step_size=20, gamma=0.5 and ignore exponential decay

Solution

  1. Step 1: Understand the two-phase learning rate schedule

    First phase: reduce lr by half every 5 epochs for 20 epochs.
    Second phase: after 20 epochs, apply exponential decay by 0.9 every epoch.
  2. Step 2: Match PyTorch schedulers to phases

    StepLR with step_size=5, gamma=0.5 fits first phase.
    ExponentialLR with gamma=0.9 fits second phase.
    Switching schedulers after 20 epochs achieves desired behavior.
  3. Final Answer:

    Use StepLR with step_size=5, gamma=0.5 for first 20 epochs, then switch to ExponentialLR with gamma=0.9 -> Option C
  4. Quick Check:

    Combine StepLR then ExponentialLR for phased decay [OK]
Hint: Combine schedulers for multi-phase learning rate changes [OK]
Common Mistakes:
  • Trying to use one scheduler for both phases
  • Ignoring the switch at epoch 20
  • Using wrong scheduler types for phases