PyTorchml~20 mins

nn.RNN layer in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - nn.RNN layer

Problem:You are training a simple RNN model on a sequence classification task. The model currently achieves 98% training accuracy but only 70% validation accuracy.

Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85

Issue:The model is overfitting: it performs very well on training data but poorly on validation data.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.

You can only modify the model architecture and training hyperparameters.

Do not change the dataset or preprocessing steps.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, dropout=0.3):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, num_layers=2, batch_first=True, dropout=dropout)
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.dropout(out[:, -1, :])
        out = self.fc(out)
        return out

# Example training loop setup
input_size = 10
hidden_size = 32  # Reduced from larger size
output_size = 2

model = SimpleRNN(input_size, hidden_size, output_size, dropout=0.3)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Reduced learning rate

# Dummy data for demonstration
X_train = torch.randn(100, 5, input_size)
y_train = torch.randint(0, output_size, (100,))
X_val = torch.randn(30, 5, input_size)
y_val = torch.randint(0, output_size, (30,))

# Training with early stopping
best_val_acc = 0
patience = 3
trigger_times = 0

for epoch in range(30):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

    model.eval()
    with torch.no_grad():
        val_outputs = model(X_val)
        val_loss = criterion(val_outputs, y_val)
        _, predicted = torch.max(val_outputs, 1)
        val_acc = (predicted == y_val).float().mean().item() * 100

    train_acc = (outputs.argmax(dim=1) == y_train).float().mean().item() * 100

    if val_acc > best_val_acc:
        best_val_acc = val_acc
        trigger_times = 0
    else:
        trigger_times += 1
        if trigger_times >= patience:
            break

print(f"Training accuracy: {train_acc:.2f}%, Validation accuracy: {best_val_acc:.2f}%")

Added dropout inside the nn.RNN layer and after the RNN output to reduce overfitting.

Reduced the hidden size from a larger number to 32 to simplify the model.

Lowered the learning rate to 0.001 for more stable training.

Implemented early stopping to prevent over-training.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.40

Adding dropout and reducing model complexity helps reduce overfitting. Early stopping prevents training too long. This leads to better validation accuracy and more generalizable models.

Bonus Experiment

Try replacing the nn.RNN layer with nn.LSTM and compare the validation accuracy.

💡 Hint

LSTM can capture longer dependencies and might improve performance on sequence data.

Practice

(1/5)

1. What does the nn.RNN layer in PyTorch primarily do?

easy

A. Processes sequences step by step, keeping track of past information

B. Sorts input data in ascending order

C. Generates random numbers for initialization

D. Performs matrix multiplication without memory

nn.RNN layer in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of RNN

Step 2: Compare options with RNN behavior

Final Answer:

Quick Check:

Solution

Step 1: Recall nn.RNN constructor parameters

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Understand batch_first=True effect

Step 2: Apply shapes to given input

Final Answer:

Quick Check:

Solution

Step 1: Check input_size parameter vs input tensor

Step 2: Validate tensor shape requirements

Final Answer:

Quick Check:

Solution

Step 1: Understand handling variable-length sequences

Step 2: Evaluate options for best practice

Final Answer:

Quick Check: