0
0
PyTorchml~20 mins

nn.GRU layer in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - nn.GRU layer
Problem:We want to classify sequences of numbers into two classes using a GRU-based neural network. The current model achieves 98% training accuracy but only 75% validation accuracy.
Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.65
Issue:The model is overfitting: it performs very well on training data but poorly on validation data.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.
You can only modify the model architecture and training hyperparameters.
Do not change the dataset or data preprocessing.
Keep the GRU layer as the main recurrent layer.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sample synthetic dataset
X_train = torch.randn(500, 10, 5)  # 500 sequences, length 10, 5 features
y_train = torch.randint(0, 2, (500,))
X_val = torch.randn(100, 10, 5)
y_val = torch.randint(0, 2, (100,))

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

class GRUClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, dropout):
        super().__init__()
        self.gru = nn.GRU(input_size, hidden_size, num_layers,
                          batch_first=True, dropout=dropout)
        self.fc = nn.Linear(hidden_size, 2)

    def forward(self, x):
        out, _ = self.gru(x)
        out = out[:, -1, :]  # last time step
        out = self.fc(out)
        return out

# Model with dropout and fewer hidden units
model = GRUClassifier(input_size=5, hidden_size=32, num_layers=1, dropout=0.3)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

num_epochs = 20
best_val_acc = 0
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * inputs.size(0)
        _, predicted = outputs.max(1)
        correct += predicted.eq(labels).sum().item()
        total += labels.size(0)
    train_acc = 100 * correct / total
    train_loss = total_loss / total

    model.eval()
    val_correct = 0
    val_total = 0
    val_loss = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = outputs.max(1)
            val_correct += predicted.eq(labels).sum().item()
            val_total += labels.size(0)
    val_acc = 100 * val_correct / val_total
    val_loss /= val_total

    if val_acc > best_val_acc:
        best_val_acc = val_acc

    print(f"Epoch {epoch+1}: Train Loss={train_loss:.3f}, Train Acc={train_acc:.1f}%, Val Loss={val_loss:.3f}, Val Acc={val_acc:.1f}%")
Added dropout=0.3 to the GRU layer to reduce overfitting.
Reduced hidden_size from a larger number (e.g., 64 or 128) to 32 to simplify the model.
Used only 1 GRU layer instead of multiple layers.
Kept learning rate at 0.001 and trained for 20 epochs to avoid overtraining.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, Training loss 0.05, Validation loss 0.65

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.40

Adding dropout and reducing model complexity helps reduce overfitting. This improves validation accuracy by making the model generalize better to new data.
Bonus Experiment
Try using bidirectional GRU layers and compare validation accuracy with the current model.
💡 Hint
Set bidirectional=True in nn.GRU and adjust the final linear layer input size accordingly.