PyTorchml~20 mins

Warmup strategies in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Warmup strategies

Problem:Training a neural network on a classification task with a fixed learning rate causes unstable training and slow convergence.

Current Metrics:Training loss decreases slowly and validation accuracy plateaus around 70%. Training accuracy reaches 85%.

Issue:The model training is unstable at the start and validation accuracy is lower than expected, indicating the learning rate might be too high initially.

Your Task

Implement a learning rate warmup strategy to improve training stability and increase validation accuracy to above 80%.

Keep the total number of training epochs the same.

Do not change the model architecture.

Use PyTorch and standard optimizer (Adam).

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Simple dataset
X = torch.randn(1000, 20)
y = (torch.sum(X, dim=1) > 0).long()

train_dataset = TensorDataset(X, y)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Simple model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(20, 64),
            nn.ReLU(),
            nn.Linear(64, 2)
        )
    def forward(self, x):
        return self.fc(x)

model = SimpleNet()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Base lr

# Warmup parameters
warmup_epochs = 5
total_epochs = 20

# Training loop with warmup
for epoch in range(total_epochs):
    if epoch < warmup_epochs:
        lr = 0.001 * (epoch + 1) / warmup_epochs  # Linear warmup
        for param_group in optimizer.param_groups:
            param_group['lr'] = lr
    else:
        for param_group in optimizer.param_groups:
            param_group['lr'] = 0.001

    model.train()
    total_loss = 0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)

    train_loss = total_loss / total
    train_acc = correct / total * 100

    # Validation simulation (using training data here for simplicity)
    model.eval()
    with torch.no_grad():
        outputs = model(X)
        val_loss = criterion(outputs, y).item()
        _, val_pred = torch.max(outputs, 1)
        val_acc = (val_pred == y).sum().item() / y.size(0) * 100

    print(f"Epoch {epoch+1}/{total_epochs} - LR: {optimizer.param_groups[0]['lr']:.6f} - Train Loss: {train_loss:.4f} - Train Acc: {train_acc:.2f}% - Val Loss: {val_loss:.4f} - Val Acc: {val_acc:.2f}%")

Added a linear learning rate warmup for the first 5 epochs starting from 0 to 0.001.

Kept the learning rate fixed at 0.001 after warmup.

Kept model architecture and total epochs unchanged.

Results Interpretation

Before warmup:
Training accuracy: 85%
Validation accuracy: 70%
Training loss decreases slowly and unstable.

After warmup:
Training accuracy: 88%
Validation accuracy: 83%
Training loss decreases faster and more stable.

Using a learning rate warmup helps the model start training gently, avoiding instability and improving validation accuracy by reducing early training shocks.

Bonus Experiment

Try using a cosine annealing learning rate scheduler after the warmup phase to further improve validation accuracy.

💡 Hint

Use PyTorch's torch.optim.lr_scheduler.CosineAnnealingLR starting after warmup epochs.

Practice

(1/5)

1. What is the main purpose of using a warmup strategy in PyTorch training?

easy

A. To immediately set the learning rate to its maximum value

B. To gradually increase the learning rate at the start of training

C. To decrease the learning rate throughout the entire training

D. To freeze model weights during the first epochs

Warmup strategies in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand what warmup means

Step 2: Identify the goal of warmup

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch schedulers for warmup

Step 2: Match scheduler to warmup use

Final Answer:

Quick Check:

Solution

Step 1: Understand the lambda function for LR

Step 2: Calculate LR at epoch 3 (0-based index)

Final Answer:

Quick Check:

Solution

Step 1: Analyze lambda function behavior at epoch 0

Step 2: Understand why zero LR is a problem

Final Answer:

Quick Check:

Solution

Step 1: Understand the warmup goal

Step 2: Check each lambda function

Final Answer:

Quick Check: