PyTorchml~20 mins

nn.LSTM layer in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - nn.LSTM layer

Problem:We want to predict the next number in a sequence using an LSTM model. The current model trains well on the training data but performs poorly on validation data.

Current Metrics:Training accuracy: 98%, Validation accuracy: 65%, Training loss: 0.05, Validation loss: 0.85

Issue:The model is overfitting: training accuracy is very high but validation accuracy is low.

Your Task

Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.

You can only modify the LSTM model architecture and training hyperparameters.

Do not change the dataset or data preprocessing.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sample synthetic dataset
sequence_length = 5
input_size = 1
hidden_size = 32
num_layers = 1
output_size = 1

# Generate dummy data: sequences of numbers and next number as label
X = torch.linspace(0, 99, steps=100).view(-1, 1)
sequences = []
labels = []
for i in range(len(X) - sequence_length):
    sequences.append(X[i:i+sequence_length])
    labels.append(X[i+sequence_length])

X_seq = torch.stack(sequences)  # Shape: (samples, seq_len, input_size)
y_seq = torch.stack(labels)    # Shape: (samples, input_size)

# Split train and validation
train_size = int(0.8 * len(X_seq))
X_train, X_val = X_seq[:train_size], X_seq[train_size:]
y_train, y_val = y_seq[:train_size], y_seq[train_size:]

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, dropout=0.2):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = out[:, -1, :]  # Take last time step output
        out = self.fc(out)
        return out

model = LSTMModel(input_size, hidden_size, num_layers, output_size, dropout=0.2)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.005)

# Training loop with early stopping
best_val_loss = float('inf')
epochs_no_improve = 0
max_epochs = 50
patience = 5

for epoch in range(max_epochs):
    model.train()
    train_losses = []
    for xb, yb in train_loader:
        optimizer.zero_grad()
        preds = model(xb)
        loss = criterion(preds, yb)
        loss.backward()
        optimizer.step()
        train_losses.append(loss.item())

    model.eval()
    val_losses = []
    with torch.no_grad():
        for xb, yb in val_loader:
            preds = model(xb)
            loss = criterion(preds, yb)
            val_losses.append(loss.item())

    avg_train_loss = sum(train_losses) / len(train_losses)
    avg_val_loss = sum(val_losses) / len(val_losses)

    if avg_val_loss < best_val_loss:
        best_val_loss = avg_val_loss
        epochs_no_improve = 0
    else:
        epochs_no_improve += 1

    if epochs_no_improve >= patience:
        break

# Calculate final training and validation accuracy as inverse of loss for simplicity
train_accuracy = 100 - avg_train_loss * 100
val_accuracy = 100 - avg_val_loss * 100

print(f"Training accuracy: {train_accuracy:.2f}%")
print(f"Validation accuracy: {val_accuracy:.2f}%")

Added dropout=0.2 inside the LSTM layer to reduce overfitting.

Reduced hidden size from 64 to 32 to simplify the model.

Used Adam optimizer with a moderate learning rate of 0.005.

Implemented early stopping with patience of 5 epochs to avoid overtraining.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 65%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.12, Validation loss 0.18

Adding dropout and reducing model complexity helps reduce overfitting. Early stopping prevents training too long. This improves validation accuracy while keeping training accuracy reasonable.

Bonus Experiment

Try using a two-layer LSTM with dropout and compare the results to the single-layer model.

💡 Hint

Increase num_layers to 2 and keep dropout. Watch for training time and validation accuracy changes.

Practice

(1/5)

1. What is the primary purpose of the nn.LSTM layer in PyTorch?

easy

A. To process and remember information from sequences over time

B. To perform image classification using convolution

C. To reduce the dimensionality of data using PCA

D. To generate random numbers for initialization

nn.LSTM layer in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of LSTM

Step 2: Match purpose with options

Final Answer:

Quick Check:

Solution

Step 1: Recall nn.LSTM constructor parameters

Step 2: Match correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand LSTM input and output shapes

Step 2: Apply given dimensions

Final Answer:

Quick Check:

Solution

Step 1: Check nn.LSTM constructor requirements

Step 2: Identify missing argument

Final Answer:

Quick Check:

Solution

Step 1: Identify input_size and hidden_size meanings

Step 2: Match given sequence and desired output

Final Answer:

Quick Check: