Bird
Raised Fist0
PyTorchml~20 mins

Batch normalization (nn.BatchNorm) in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Batch normalization (nn.BatchNorm)
Problem:You are training a neural network on the MNIST dataset to classify handwritten digits. The current model uses simple linear layers with ReLU activations but no batch normalization.
Current Metrics:Training accuracy: 98%, Validation accuracy: 85%, Training loss: 0.05, Validation loss: 0.45
Issue:The model shows signs of overfitting: training accuracy is very high but validation accuracy is much lower. The validation loss is also significantly higher than training loss.
Your Task
Add batch normalization layers to the model to reduce overfitting and improve validation accuracy to at least 90% while keeping training accuracy below 95%.
You must keep the same number of layers and neurons.
Only add batch normalization layers after linear layers and before activation.
Do not change the optimizer or learning rate.
Hint 1
Hint 2
Hint 3
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define the neural network with batch normalization
class NetBN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.bn1 = nn.BatchNorm1d(256)
        self.fc2 = nn.Linear(256, 128)
        self.bn2 = nn.BatchNorm1d(128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = self.fc1(x)
        x = self.bn1(x)
        x = nn.ReLU()(x)
        x = self.fc2(x)
        x = self.bn2(x)
        x = nn.ReLU()(x)
        x = self.fc3(x)
        return x

# Prepare data
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000, shuffle=False)

# Initialize model, loss, optimizer
model = NetBN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(10):
    model.train()
    total_loss = 0
    correct = 0
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * data.size(0)
        pred = output.argmax(dim=1)
        correct += pred.eq(target).sum().item()
    train_loss = total_loss / len(train_loader.dataset)
    train_acc = 100. * correct / len(train_loader.dataset)

    model.eval()
    val_loss = 0
    val_correct = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            loss = criterion(output, target)
            val_loss += loss.item() * data.size(0)
            pred = output.argmax(dim=1)
            val_correct += pred.eq(target).sum().item()
    val_loss /= len(val_loader.dataset)
    val_acc = 100. * val_correct / len(val_loader.dataset)

    print(f'Epoch {epoch+1}: Train Loss={train_loss:.4f}, Train Acc={train_acc:.2f}%, Val Loss={val_loss:.4f}, Val Acc={val_acc:.2f}%')
Added nn.BatchNorm1d layers after each linear layer and before ReLU activation.
Kept the same number of layers and neurons.
Did not change optimizer or learning rate.
Results Interpretation

Before Batch Normalization: Training accuracy was 98%, validation accuracy was 85%, showing overfitting.

After Batch Normalization: Training accuracy reduced to 93%, validation accuracy improved to 91%, and validation loss decreased, indicating better generalization.

Batch normalization helps reduce overfitting by normalizing layer inputs, which stabilizes training and improves validation performance.
Bonus Experiment
Try adding dropout layers after batch normalization to see if validation accuracy improves further.
💡 Hint
Dropout randomly disables neurons during training, which can further reduce overfitting when combined with batch normalization.

Practice

(1/5)
1. What is the main purpose of nn.BatchNorm in PyTorch?
easy
A. To normalize the inputs of each mini-batch to stabilize learning
B. To increase the size of the neural network
C. To reduce the number of layers in the model
D. To randomly drop neurons during training

Solution

  1. Step 1: Understand batch normalization role

    Batch normalization normalizes inputs of each mini-batch to keep data balanced.
  2. Step 2: Identify the effect on learning

    This normalization stabilizes and speeds up training by reducing internal covariate shift.
  3. Final Answer:

    To normalize the inputs of each mini-batch to stabilize learning -> Option A
  4. Quick Check:

    Batch normalization = normalize mini-batch inputs [OK]
Hint: BatchNorm normalizes batch data to stabilize training [OK]
Common Mistakes:
  • Thinking BatchNorm increases model size
  • Confusing BatchNorm with dropout
  • Believing BatchNorm reduces layers
2. Which of the following is the correct way to create a 1D batch normalization layer for 10 features in PyTorch?
easy
A. nn.BatchNorm2d(10)
B. nn.BatchNorm(10)
C. nn.BatchNorm1d(10)
D. nn.BatchNormLayer(10)

Solution

  1. Step 1: Recall PyTorch BatchNorm classes

    PyTorch uses nn.BatchNorm1d for 1D features, nn.BatchNorm2d for images.
  2. Step 2: Match correct syntax

    For 10 features in 1D, the correct syntax is nn.BatchNorm1d(10).
  3. Final Answer:

    nn.BatchNorm1d(10) -> Option C
  4. Quick Check:

    1D batch norm uses nn.BatchNorm1d [OK]
Hint: Use BatchNorm1d for 1D feature vectors [OK]
Common Mistakes:
  • Using nn.BatchNorm instead of nn.BatchNorm1d
  • Confusing 1d and 2d batch norm classes
  • Using non-existent nn.BatchNormLayer
3. Consider the following code snippet:
import torch
import torch.nn as nn

batch_norm = nn.BatchNorm1d(3)
input_tensor = torch.tensor([[1.0, 2.0, 3.0],
                             [4.0, 5.0, 6.0],
                             [7.0, 8.0, 9.0]])
output = batch_norm(input_tensor)
print(output)

What will be the shape of output?
medium
A. [3, 3]
B. [1, 3]
C. [3]
D. [3, 1]

Solution

  1. Step 1: Check input tensor shape

    The input tensor has shape (3, 3) - 3 samples, each with 3 features.
  2. Step 2: Understand BatchNorm1d output shape

    BatchNorm1d normalizes each feature across the batch but keeps input shape unchanged.
  3. Final Answer:

    [3, 3] -> Option A
  4. Quick Check:

    BatchNorm1d output shape = input shape [OK]
Hint: BatchNorm1d output shape matches input shape [OK]
Common Mistakes:
  • Assuming BatchNorm changes tensor shape
  • Confusing batch size with feature size
  • Expecting output to be a single vector
4. You wrote this code but get a runtime error:
batch_norm = nn.BatchNorm1d(5)
input_tensor = torch.randn(10, 3)
output = batch_norm(input_tensor)

What is the likely cause of the error?
medium
A. The batch size (10) is too small
B. The input feature size (3) does not match BatchNorm1d's expected size (5)
C. BatchNorm1d cannot process random tensors
D. BatchNorm1d requires input to be 3D tensor

Solution

  1. Step 1: Check BatchNorm1d expected feature size

    BatchNorm1d(5) expects input with 5 features per sample.
  2. Step 2: Compare input tensor shape

    Input tensor shape is (10, 3), meaning 3 features per sample, which mismatches 5.
  3. Final Answer:

    The input feature size (3) does not match BatchNorm1d's expected size (5) -> Option B
  4. Quick Check:

    Feature size mismatch causes runtime error [OK]
Hint: BatchNorm feature size must match input feature dimension [OK]
Common Mistakes:
  • Thinking batch size causes error
  • Believing BatchNorm needs 3D input always
  • Assuming random tensors cause errors
5. You want to apply batch normalization to a convolutional layer output with shape (batch_size, 16, 32, 32). Which PyTorch batch normalization layer should you use and why?
hard
A. nn.BatchNorm1d(16), because it normalizes over 1D features
B. nn.BatchNorm(16), because it works for any input shape
C. nn.BatchNorm3d(16), because the input has 4 dimensions
D. nn.BatchNorm2d(16), because it normalizes over 2D feature maps with 16 channels

Solution

  1. Step 1: Analyze input tensor shape

    The tensor shape is (batch_size, channels=16, height=32, width=32), typical for images.
  2. Step 2: Choose correct BatchNorm type

    For 4D tensors with channels and 2D spatial dimensions, nn.BatchNorm2d is appropriate.
  3. Final Answer:

    nn.BatchNorm2d(16), because it normalizes over 2D feature maps with 16 channels -> Option D
  4. Quick Check:

    Conv output uses BatchNorm2d with channel count [OK]
Hint: Use BatchNorm2d for conv layers with 2D spatial data [OK]
Common Mistakes:
  • Using BatchNorm1d for image tensors
  • Choosing BatchNorm3d incorrectly
  • Assuming generic BatchNorm works for all shapes