0
0
PyTorchml~20 mins

nn.LSTM layer in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - nn.LSTM layer
Problem:We want to predict the next number in a sequence using an LSTM model. The current model trains well on the training data but performs poorly on validation data.
Current Metrics:Training accuracy: 98%, Validation accuracy: 65%, Training loss: 0.05, Validation loss: 0.85
Issue:The model is overfitting: training accuracy is very high but validation accuracy is low.
Your Task
Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.
You can only modify the LSTM model architecture and training hyperparameters.
Do not change the dataset or data preprocessing.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sample synthetic dataset
sequence_length = 5
input_size = 1
hidden_size = 32
num_layers = 1
output_size = 1

# Generate dummy data: sequences of numbers and next number as label
X = torch.linspace(0, 99, steps=100).view(-1, 1)
sequences = []
labels = []
for i in range(len(X) - sequence_length):
    sequences.append(X[i:i+sequence_length])
    labels.append(X[i+sequence_length])

X_seq = torch.stack(sequences)  # Shape: (samples, seq_len, input_size)
y_seq = torch.stack(labels)    # Shape: (samples, input_size)

# Split train and validation
train_size = int(0.8 * len(X_seq))
X_train, X_val = X_seq[:train_size], X_seq[train_size:]
y_train, y_val = y_seq[:train_size], y_seq[train_size:]

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, dropout=0.2):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = out[:, -1, :]  # Take last time step output
        out = self.fc(out)
        return out

model = LSTMModel(input_size, hidden_size, num_layers, output_size, dropout=0.2)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.005)

# Training loop with early stopping
best_val_loss = float('inf')
epochs_no_improve = 0
max_epochs = 50
patience = 5

for epoch in range(max_epochs):
    model.train()
    train_losses = []
    for xb, yb in train_loader:
        optimizer.zero_grad()
        preds = model(xb)
        loss = criterion(preds, yb)
        loss.backward()
        optimizer.step()
        train_losses.append(loss.item())

    model.eval()
    val_losses = []
    with torch.no_grad():
        for xb, yb in val_loader:
            preds = model(xb)
            loss = criterion(preds, yb)
            val_losses.append(loss.item())

    avg_train_loss = sum(train_losses) / len(train_losses)
    avg_val_loss = sum(val_losses) / len(val_losses)

    if avg_val_loss < best_val_loss:
        best_val_loss = avg_val_loss
        epochs_no_improve = 0
    else:
        epochs_no_improve += 1

    if epochs_no_improve >= patience:
        break

# Calculate final training and validation accuracy as inverse of loss for simplicity
train_accuracy = 100 - avg_train_loss * 100
val_accuracy = 100 - avg_val_loss * 100

print(f"Training accuracy: {train_accuracy:.2f}%")
print(f"Validation accuracy: {val_accuracy:.2f}%")
Added dropout=0.2 inside the LSTM layer to reduce overfitting.
Reduced hidden size from 64 to 32 to simplify the model.
Used Adam optimizer with a moderate learning rate of 0.005.
Implemented early stopping with patience of 5 epochs to avoid overtraining.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 65%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.12, Validation loss 0.18

Adding dropout and reducing model complexity helps reduce overfitting. Early stopping prevents training too long. This improves validation accuracy while keeping training accuracy reasonable.
Bonus Experiment
Try using a two-layer LSTM with dropout and compare the results to the single-layer model.
💡 Hint
Increase num_layers to 2 and keep dropout. Watch for training time and validation accuracy changes.