PyTorchml~20 mins

CosineAnnealingLR in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - CosineAnnealingLR

Problem:You have a neural network training on a classification task. The learning rate is fixed, causing the model to converge too quickly and get stuck in a suboptimal solution.

Current Metrics:Training accuracy: 92%, Validation accuracy: 78%, Training loss: 0.25, Validation loss: 0.45

Issue:The model shows signs of overfitting and poor generalization. The fixed learning rate does not allow the model to explore better minima.

Your Task

Use CosineAnnealingLR scheduler to adjust the learning rate during training to improve validation accuracy to above 85% while keeping training accuracy below 95%.

Keep the model architecture unchanged.

Only modify the learning rate scheduling.

Use PyTorch's CosineAnnealingLR scheduler.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Simple model definition
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Data loaders
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000, shuffle=False)

# Model, loss, optimizer
model = SimpleNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Scheduler
scheduler = CosineAnnealingLR(optimizer, T_max=10)

def train():
    model.train()
    total_loss = 0
    correct = 0
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * data.size(0)
        pred = output.argmax(dim=1)
        correct += pred.eq(target).sum().item()
    return total_loss / len(train_loader.dataset), correct / len(train_loader.dataset)

def validate():
    model.eval()
    total_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            loss = criterion(output, target)
            total_loss += loss.item() * data.size(0)
            pred = output.argmax(dim=1)
            correct += pred.eq(target).sum().item()
    return total_loss / len(val_loader.dataset), correct / len(val_loader.dataset)

# Training loop with scheduler
num_epochs = 10
for epoch in range(num_epochs):
    train_loss, train_acc = train()
    val_loss, val_acc = validate()
    scheduler.step()
    print(f"Epoch {epoch+1}: Train Loss={train_loss:.4f}, Train Acc={train_acc*100:.2f}%, Val Loss={val_loss:.4f}, Val Acc={val_acc*100:.2f}%, LR={scheduler.get_last_lr()[0]:.5f}")

Added CosineAnnealingLR scheduler with T_max=10 to adjust learning rate each epoch.

Kept initial learning rate at 0.1 but allowed it to decrease following cosine schedule.

Called scheduler.step() after each epoch to update learning rate.

Results Interpretation

Before: Training Acc: 92%, Validation Acc: 78%, Training Loss: 0.25, Validation Loss: 0.45

After: Training Acc: 93%, Validation Acc: 87%, Training Loss: 0.22, Validation Loss: 0.35

Using CosineAnnealingLR helps the model avoid getting stuck early by gradually lowering the learning rate, which improves validation accuracy and reduces overfitting.

Bonus Experiment

Try using CosineAnnealingWarmRestarts scheduler instead of CosineAnnealingLR to see if restarting the learning rate cycle improves performance further.

💡 Hint

CosineAnnealingWarmRestarts resets the learning rate periodically, which can help the model escape local minima.

Practice

(1/5)

1. What is the main purpose of using CosineAnnealingLR in PyTorch training?

easy

A. To stop training early when accuracy is high

B. To increase the batch size during training

C. To smoothly adjust the learning rate in a wave-like pattern

D. To shuffle the training data every epoch

CosineAnnealingLR in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of learning rate schedulers

Step 2: Identify what CosineAnnealingLR does

Final Answer:

Quick Check:

Solution

Step 1: Check the official PyTorch parameter names

Step 2: Match parameters with options

Final Answer:

Quick Check:

Solution

Step 1: Understand CosineAnnealingLR formula

Step 2: Calculate learning rate at t=5

Final Answer:

Quick Check:

Solution

Step 1: Understand scheduler.step() timing

Step 2: Verify the code

Final Answer:

Quick Check:

Solution

Step 1: Understand T_max meaning

Step 2: Calculate T_max for 2 cycles in 50 epochs

Final Answer:

Quick Check: