Bird
Raised Fist0
Computer Visionml~5 mins

Learning rate selection in Computer Vision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

Learning rate controls how fast a model learns. Choosing the right learning rate helps the model learn well without getting stuck or jumping around.

When training a new image classifier and you want it to learn efficiently.
When your model's training loss is not improving or is unstable.
When you want to speed up training without losing accuracy.
When fine-tuning a pre-trained vision model on a new dataset.
When experimenting with different training setups to find the best results.
Syntax
Computer Vision
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

The learning rate is set as lr in the optimizer.

Common optimizers include SGD, Adam, and RMSprop.

Examples
Set learning rate to 0.01 using SGD optimizer.
Computer Vision
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Set learning rate to 0.001 using Adam optimizer.
Computer Vision
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Reduce learning rate by 10 times every 10 epochs.
Computer Vision
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
Sample Model

This code trains a simple CNN on MNIST for one batch using a learning rate of 0.01. It prints the loss to show training progress.

Computer Vision
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# Simple CNN model for MNIST
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.pool = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(10 * 12 * 12, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = x.view(-1, 10 * 12 * 12)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Load MNIST data
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)

# Initialize model, loss, optimizer with learning rate 0.01
model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
learning_rate = 0.01
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

# Train for 1 batch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    if batch_idx == 0:
        print(f'Batch {batch_idx} Loss: {loss.item():.4f}')
        break
OutputSuccess
Important Notes

Too high learning rate can make training unstable.

Too low learning rate makes training slow.

Try learning rate schedules to improve training.

Summary

Learning rate controls how fast the model updates.

Choose learning rate carefully for good training.

Use optimizers and schedulers to manage learning rate.

Practice

(1/5)
1.

What does the learning rate control in training a computer vision model?

easy
A. The number of layers in the model
B. The size of the input images
C. How fast the model updates its knowledge
D. The type of activation function used

Solution

  1. Step 1: Understand the role of learning rate

    The learning rate determines how much the model changes its weights after seeing each example.
  2. Step 2: Connect learning rate to model updates

    A higher learning rate means faster updates, while a lower rate means slower updates.
  3. Final Answer:

    How fast the model updates its knowledge -> Option C
  4. Quick Check:

    Learning rate controls update speed = C [OK]
Hint: Learning rate = speed of model learning updates [OK]
Common Mistakes:
  • Confusing learning rate with model size
  • Thinking learning rate changes input data
  • Mixing learning rate with activation functions
2.

Which of the following is the correct way to set a learning rate of 0.01 using PyTorch's SGD optimizer?

import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=___)
easy
A. 0.01
B. 0.1
C. "0.01"
D. learning_rate

Solution

  1. Step 1: Check the expected type for learning rate

    The learning rate parameter expects a float number, not a string or variable name.
  2. Step 2: Identify the correct float value for 0.01

    Using 0.01 as a float sets the learning rate correctly.
  3. Final Answer:

    0.01 -> Option A
  4. Quick Check:

    Learning rate as float = 0.01 [OK]
Hint: Use float numbers, not strings or variables, for lr [OK]
Common Mistakes:
  • Using string "0.01" instead of float 0.01
  • Passing undefined variable learning_rate
  • Setting lr to 0.1 by mistake
3.

Consider this training loop snippet for a vision model:

learning_rate = 0.5
for epoch in range(3):
    loss = train_one_epoch(model, data, learning_rate)
    print(f"Epoch {epoch+1} loss: {loss:.2f}")

If the learning rate is too high, what is the most likely output behavior?

medium
A. Loss becomes zero immediately
B. Loss stays constant
C. Loss steadily decreases each epoch
D. Loss fluctuates or increases wildly

Solution

  1. Step 1: Understand effect of high learning rate

    A very high learning rate like 0.5 can cause the model to overshoot the best weights, making training unstable.
  2. Step 2: Predict loss behavior with unstable training

    Loss will not steadily decrease but will jump up and down or increase.
  3. Final Answer:

    Loss fluctuates or increases wildly -> Option D
  4. Quick Check:

    High lr causes unstable loss = A [OK]
Hint: High learning rate causes unstable or rising loss [OK]
Common Mistakes:
  • Assuming loss always decreases regardless of lr
  • Thinking loss becomes zero immediately
  • Confusing constant loss with stable training
4.

Given this code snippet, identify the error related to learning rate usage:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
for epoch in range(5):
    loss = train(model, data)
    optimizer.step()
    optimizer.zero_grad()
medium
A. optimizer.step() called before loss.backward()
B. Learning rate is too high for Adam optimizer
C. optimizer.zero_grad() should be called after optimizer.step()
D. Learning rate should be set inside the loop

Solution

  1. Step 1: Check optimizer usage order

    Before calling optimizer.step(), gradients must be computed by loss.backward().
  2. Step 2: Identify missing backward call

    The code misses loss.backward(), so optimizer.step() updates without gradients.
  3. Final Answer:

    optimizer.step() called before loss.backward() -> Option A
  4. Quick Check:

    Missing loss.backward() before step = B [OK]
Hint: Call loss.backward() before optimizer.step() [OK]
Common Mistakes:
  • Thinking learning rate 0.001 is too high for Adam
  • Believing zero_grad() order is wrong here
  • Assuming learning rate must change each epoch
5.

You want to train a deep vision model on a new dataset. You start with a learning rate of 0.1 but notice training loss does not decrease. What is the best next step?

hard
A. Remove the learning rate parameter from the optimizer
B. Decrease the learning rate to 0.01 and try again
C. Keep the learning rate at 0.1 and train longer
D. Increase the learning rate to 1.0 for faster learning

Solution

  1. Step 1: Analyze why loss does not decrease

    A high learning rate like 0.1 can cause the model to skip the best weights, preventing loss decrease.
  2. Step 2: Choose a safer learning rate adjustment

    Lowering the learning rate to 0.01 allows smaller, stable updates to improve training.
  3. Final Answer:

    Decrease the learning rate to 0.01 and try again -> Option B
  4. Quick Check:

    Lower lr if loss stuck = D [OK]
Hint: Lower learning rate if loss doesn't drop [OK]
Common Mistakes:
  • Increasing learning rate when training fails
  • Ignoring learning rate and training longer
  • Removing learning rate parameter entirely