PyTorchml~20 mins

Learning rate differential in PyTorch - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Learning Rate Differential Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why use different learning rates for different layers?

In training deep neural networks, why might we assign different learning rates to different layers?

ABecause different learning rates help the optimizer skip some layers during training.

BBecause early layers often learn general features and require smaller learning rates, while later layers learn task-specific features and can use larger learning rates.

CBecause using the same learning rate for all layers always causes the model to diverge.

DBecause some layers have more parameters and need slower updates to avoid overfitting.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of learning rate differential setup in PyTorch

What will be the learning rate of the parameters in model.layer1 and model.layer2 after this code runs?

PyTorch

import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(10, 5)
        self.layer2 = nn.Linear(5, 2)

model = SimpleModel()
optimizer = torch.optim.SGD([
    {'params': model.layer1.parameters(), 'lr': 0.001},
    {'params': model.layer2.parameters(), 'lr': 0.01}
], momentum=0.9)

lrs = [group['lr'] for group in optimizer.param_groups]
print(lrs)

A[0.01, 0.001]

B[0.001]

C[0.01]

D[0.001, 0.01]

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Choosing learning rates for differential training

You want to fine-tune a pretrained model by freezing early layers and training only the last few layers. Which learning rate setup is best?

ASet zero learning rate for frozen layers and a small learning rate for trainable layers.

BSet a high learning rate for all layers to speed up training.

CSet a small learning rate for frozen layers and a high learning rate for trainable layers.

DSet the same moderate learning rate for all layers regardless of freezing.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Effect of learning rate differential on training loss

During training with differential learning rates, you notice the loss decreases quickly at first but then plateaus. What is a likely cause?

AThe learning rate for later layers is too low, slowing learning.

BThe learning rates are perfectly balanced; plateau is normal.

CThe learning rate for early layers is too high, causing instability.

DThe optimizer momentum is set to zero, causing slow convergence.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Why does this differential learning rate code cause an error?

What error does this PyTorch code raise when trying to set different learning rates?

PyTorch

import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(10, 5),
    nn.ReLU(),
    nn.Linear(5, 2)
)

optimizer = torch.optim.Adam([
    {'params': model[0].parameters(), 'lr': 0.001},
    {'params': model[1].parameters(), 'lr': 0.01}
])

ANo error, code runs successfully

BRuntimeError: optimizer parameter groups must have 'params' key

CAttributeError: 'ReLU' object has no attribute 'parameters'

DTypeError: optimizer got an unexpected keyword argument 'lr'

Attempts:

2 left

Practice

(1/5)

1. What does learning rate differential mean in PyTorch training?

easy

A. Changing the learning rate randomly during training

B. Setting different learning rates for different parts of a model

C. Using the same learning rate for the entire model

D. Freezing all model layers during training

Learning rate differential in PyTorch - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand learning rate concept

Step 2: Define learning rate differential

Final Answer:

Quick Check:

Solution

Step 1: Check PyTorch optimizer syntax for param groups

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Identify learning rates assigned to each layer

Step 2: Find learning rate for model.layer2

Final Answer:

Quick Check:

Solution

Step 1: Review param groups and learning rates

Step 2: Understand default lr behavior

Final Answer:

Quick Check:

Solution

Step 1: Understand freezing and learning rate

Step 2: Apply learning rate differential for fine-tuning

Final Answer:

Quick Check: