Bird
Raised Fist0
PyTorchml~20 mins

nn.LSTM layer in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - nn.LSTM layer
Problem:We want to predict the next number in a sequence using an LSTM model. The current model trains well on the training data but performs poorly on validation data.
Current Metrics:Training accuracy: 98%, Validation accuracy: 65%, Training loss: 0.05, Validation loss: 0.85
Issue:The model is overfitting: training accuracy is very high but validation accuracy is low.
Your Task
Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.
You can only modify the LSTM model architecture and training hyperparameters.
Do not change the dataset or data preprocessing.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sample synthetic dataset
sequence_length = 5
input_size = 1
hidden_size = 32
num_layers = 1
output_size = 1

# Generate dummy data: sequences of numbers and next number as label
X = torch.linspace(0, 99, steps=100).view(-1, 1)
sequences = []
labels = []
for i in range(len(X) - sequence_length):
    sequences.append(X[i:i+sequence_length])
    labels.append(X[i+sequence_length])

X_seq = torch.stack(sequences)  # Shape: (samples, seq_len, input_size)
y_seq = torch.stack(labels)    # Shape: (samples, input_size)

# Split train and validation
train_size = int(0.8 * len(X_seq))
X_train, X_val = X_seq[:train_size], X_seq[train_size:]
y_train, y_val = y_seq[:train_size], y_seq[train_size:]

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, dropout=0.2):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = out[:, -1, :]  # Take last time step output
        out = self.fc(out)
        return out

model = LSTMModel(input_size, hidden_size, num_layers, output_size, dropout=0.2)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.005)

# Training loop with early stopping
best_val_loss = float('inf')
epochs_no_improve = 0
max_epochs = 50
patience = 5

for epoch in range(max_epochs):
    model.train()
    train_losses = []
    for xb, yb in train_loader:
        optimizer.zero_grad()
        preds = model(xb)
        loss = criterion(preds, yb)
        loss.backward()
        optimizer.step()
        train_losses.append(loss.item())

    model.eval()
    val_losses = []
    with torch.no_grad():
        for xb, yb in val_loader:
            preds = model(xb)
            loss = criterion(preds, yb)
            val_losses.append(loss.item())

    avg_train_loss = sum(train_losses) / len(train_losses)
    avg_val_loss = sum(val_losses) / len(val_losses)

    if avg_val_loss < best_val_loss:
        best_val_loss = avg_val_loss
        epochs_no_improve = 0
    else:
        epochs_no_improve += 1

    if epochs_no_improve >= patience:
        break

# Calculate final training and validation accuracy as inverse of loss for simplicity
train_accuracy = 100 - avg_train_loss * 100
val_accuracy = 100 - avg_val_loss * 100

print(f"Training accuracy: {train_accuracy:.2f}%")
print(f"Validation accuracy: {val_accuracy:.2f}%")
Added dropout=0.2 inside the LSTM layer to reduce overfitting.
Reduced hidden size from 64 to 32 to simplify the model.
Used Adam optimizer with a moderate learning rate of 0.005.
Implemented early stopping with patience of 5 epochs to avoid overtraining.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 65%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.12, Validation loss 0.18

Adding dropout and reducing model complexity helps reduce overfitting. Early stopping prevents training too long. This improves validation accuracy while keeping training accuracy reasonable.
Bonus Experiment
Try using a two-layer LSTM with dropout and compare the results to the single-layer model.
💡 Hint
Increase num_layers to 2 and keep dropout. Watch for training time and validation accuracy changes.

Practice

(1/5)
1. What is the primary purpose of the nn.LSTM layer in PyTorch?
easy
A. To process and remember information from sequences over time
B. To perform image classification using convolution
C. To reduce the dimensionality of data using PCA
D. To generate random numbers for initialization

Solution

  1. Step 1: Understand the role of LSTM

    LSTM stands for Long Short-Term Memory, a type of recurrent neural network layer designed to handle sequence data and remember information over time.
  2. Step 2: Match purpose with options

    Among the options, only processing and remembering sequence information matches the LSTM's purpose.
  3. Final Answer:

    To process and remember information from sequences over time -> Option A
  4. Quick Check:

    LSTM purpose = sequence memory [OK]
Hint: LSTM = sequence memory layer, not image or random [OK]
Common Mistakes:
  • Confusing LSTM with convolutional layers
  • Thinking LSTM reduces data dimension like PCA
  • Assuming LSTM generates random numbers
2. Which of the following is the correct way to create an LSTM layer in PyTorch with input size 10 and hidden size 20?
easy
A. nn.LSTM(input=10, hidden=20)
B. nn.LSTM(20, 10)
C. nn.LSTM(10, 20)
D. nn.LSTM(hidden_size=10, input_size=20)

Solution

  1. Step 1: Recall nn.LSTM constructor parameters

    The first argument is input_size (features per input), the second is hidden_size (features in hidden state).
  2. Step 2: Match correct syntax

    nn.LSTM(10, 20) uses nn.LSTM(10, 20) which correctly sets input_size=10 and hidden_size=20.
  3. Final Answer:

    nn.LSTM(10, 20) -> Option C
  4. Quick Check:

    Constructor order = input_size, hidden_size [OK]
Hint: First arg input size, second hidden size in nn.LSTM() [OK]
Common Mistakes:
  • Swapping input_size and hidden_size
  • Using wrong keyword arguments
  • Confusing parameter names
3. Given the code below, what is the shape of output after running the LSTM?
import torch
import torch.nn as nn
lstm = nn.LSTM(input_size=5, hidden_size=3, num_layers=1)
inputs = torch.randn(4, 2, 5)  # seq_len=4, batch=2, input_size=5
output, (hn, cn) = lstm(inputs)
medium
A. (4, 2, 3)
B. (2, 4, 3)
C. (4, 3, 2)
D. (2, 3, 4)

Solution

  1. Step 1: Understand LSTM input and output shapes

    The input shape is (seq_len, batch, input_size). The output shape is (seq_len, batch, hidden_size).
  2. Step 2: Apply given dimensions

    Input shape is (4, 2, 5), hidden_size=3, so output shape is (4, 2, 3).
  3. Final Answer:

    (4, 2, 3) -> Option A
  4. Quick Check:

    Output shape = (seq_len, batch, hidden_size) [OK]
Hint: Output shape matches (seq_len, batch, hidden_size) [OK]
Common Mistakes:
  • Mixing batch and sequence dimensions
  • Confusing input_size with hidden_size
  • Assuming output shape swaps batch and seq_len
4. What is wrong with this code snippet that tries to create an LSTM layer?
import torch.nn as nn
lstm = nn.LSTM(10)
medium
A. The input size must be a tuple, not an integer
B. It misses the hidden_size argument, causing an error
C. LSTM requires a batch size argument at creation
D. The code is correct and runs without error

Solution

  1. Step 1: Check nn.LSTM constructor requirements

    nn.LSTM requires at least two positional arguments: input_size and hidden_size.
  2. Step 2: Identify missing argument

    The code only provides input_size=10, missing hidden_size, so it will raise a TypeError.
  3. Final Answer:

    It misses the hidden_size argument, causing an error -> Option B
  4. Quick Check:

    nn.LSTM needs input_size and hidden_size [OK]
Hint: nn.LSTM needs two sizes: input and hidden [OK]
Common Mistakes:
  • Thinking batch size is needed at layer creation
  • Assuming input_size can be a tuple
  • Believing code runs without error
5. You want to build a model that processes sequences of length 6 with 8 features each. You want the LSTM to output a sequence with 12 features per time step. Which of the following LSTM layer initializations is correct to achieve this?
hard
A. nn.LSTM(input_size=12, hidden_size=8)
B. nn.LSTM(input_size=8, hidden_size=6)
C. nn.LSTM(input_size=6, hidden_size=8)
D. nn.LSTM(input_size=8, hidden_size=12)

Solution

  1. Step 1: Identify input_size and hidden_size meanings

    input_size is the number of features per time step in the input sequence. hidden_size is the number of features in the output per time step.
  2. Step 2: Match given sequence and desired output

    Input sequences have 8 features, so input_size=8. Desired output features per time step is 12, so hidden_size=12.
  3. Final Answer:

    nn.LSTM(input_size=8, hidden_size=12) -> Option D
  4. Quick Check:

    Input features = 8, output features = 12 [OK]
Hint: Input size = input features, hidden size = output features [OK]
Common Mistakes:
  • Confusing sequence length with input_size
  • Swapping input_size and hidden_size
  • Using sequence length as hidden_size