PyTorchml~20 mins

nn.GRU layer in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - nn.GRU layer

Problem:We want to classify sequences of numbers into two classes using a GRU-based neural network. The current model achieves 98% training accuracy but only 75% validation accuracy.

Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.65

Issue:The model is overfitting: it performs very well on training data but poorly on validation data.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.

You can only modify the model architecture and training hyperparameters.

Do not change the dataset or data preprocessing.

Keep the GRU layer as the main recurrent layer.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sample synthetic dataset
X_train = torch.randn(500, 10, 5)  # 500 sequences, length 10, 5 features
y_train = torch.randint(0, 2, (500,))
X_val = torch.randn(100, 10, 5)
y_val = torch.randint(0, 2, (100,))

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

class GRUClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, dropout):
        super().__init__()
        self.gru = nn.GRU(input_size, hidden_size, num_layers,
                          batch_first=True, dropout=dropout)
        self.fc = nn.Linear(hidden_size, 2)

    def forward(self, x):
        out, _ = self.gru(x)
        out = out[:, -1, :]  # last time step
        out = self.fc(out)
        return out

# Model with dropout and fewer hidden units
model = GRUClassifier(input_size=5, hidden_size=32, num_layers=1, dropout=0.3)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

num_epochs = 20
best_val_acc = 0
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * inputs.size(0)
        _, predicted = outputs.max(1)
        correct += predicted.eq(labels).sum().item()
        total += labels.size(0)
    train_acc = 100 * correct / total
    train_loss = total_loss / total

    model.eval()
    val_correct = 0
    val_total = 0
    val_loss = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = outputs.max(1)
            val_correct += predicted.eq(labels).sum().item()
            val_total += labels.size(0)
    val_acc = 100 * val_correct / val_total
    val_loss /= val_total

    if val_acc > best_val_acc:
        best_val_acc = val_acc

    print(f"Epoch {epoch+1}: Train Loss={train_loss:.3f}, Train Acc={train_acc:.1f}%, Val Loss={val_loss:.3f}, Val Acc={val_acc:.1f}%")

Added dropout=0.3 to the GRU layer to reduce overfitting.

Reduced hidden_size from a larger number (e.g., 64 or 128) to 32 to simplify the model.

Used only 1 GRU layer instead of multiple layers.

Kept learning rate at 0.001 and trained for 20 epochs to avoid overtraining.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, Training loss 0.05, Validation loss 0.65

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.40

Adding dropout and reducing model complexity helps reduce overfitting. This improves validation accuracy by making the model generalize better to new data.

Bonus Experiment

Try using bidirectional GRU layers and compare validation accuracy with the current model.

💡 Hint

Set bidirectional=True in nn.GRU and adjust the final linear layer input size accordingly.

Practice

(1/5)

1. What is the primary purpose of the nn.GRU layer in PyTorch?

easy

A. To reduce the dimensionality of data using PCA

B. To perform image classification using convolution

C. To process sequential data by remembering past information

D. To generate random numbers for initialization

nn.GRU layer in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of GRU

Step 2: Compare with other options

Final Answer:

Quick Check:

Solution

Step 1: Recall GRU constructor parameters

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand batch_first=True effect

Step 2: Apply shapes from code

Final Answer:

Quick Check:

Solution

Step 1: Check default GRU input expectations

Step 2: Verify output shape

Step 3: Evaluate statements

Final Answer:

Quick Check:

Solution

Step 1: Understand variable-length sequence handling

Step 2: Evaluate options

Final Answer:

Quick Check: