Bird
Raised Fist0
PyTorchml~20 mins

Sequence classification in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Sequence classification
Problem:Classify sequences of numbers into two classes using a simple RNN model.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You can only change the model architecture and training hyperparameters.
Do not change the dataset or data preprocessing.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Generate dummy dataset
torch.manual_seed(0)
sequence_length = 10
input_size = 5
hidden_size = 16
num_classes = 2
num_samples = 1000

X = torch.randn(num_samples, sequence_length, input_size)
y = (torch.sum(X, dim=(1,2)) > 0).long()

# Split dataset
train_size = int(0.8 * num_samples)
X_train, X_val = X[:train_size], X[train_size:]
y_train, y_val = y[:train_size], y[train_size:]

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

# Define model with dropout
class RNNClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes, dropout=0.3):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = out[:, -1, :]
        out = self.dropout(out)
        out = self.fc(out)
        return out

model = RNNClassifier(input_size, hidden_size, num_classes, dropout=0.3)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop with early stopping
best_val_acc = 0
patience = 5
trigger_times = 0

for epoch in range(50):
    model.train()
    for xb, yb in train_loader:
        optimizer.zero_grad()
        outputs = model(xb)
        loss = criterion(outputs, yb)
        loss.backward()
        optimizer.step()

    # Validation
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for xb, yb in val_loader:
            outputs = model(xb)
            _, predicted = torch.max(outputs, 1)
            total += yb.size(0)
            correct += (predicted == yb).sum().item()
    val_acc = correct / total

    # Early stopping check
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        trigger_times = 0
    else:
        trigger_times += 1
        if trigger_times >= patience:
            break

# Calculate final training accuracy
model.eval()
correct_train = 0
total_train = 0
with torch.no_grad():
    for xb, yb in train_loader:
        outputs = model(xb)
        _, predicted = torch.max(outputs, 1)
        total_train += yb.size(0)
        correct_train += (predicted == yb).sum().item()
train_acc = correct_train / total_train

# Calculate final validation loss
val_loss_total = 0
val_samples = 0
with torch.no_grad():
    for xb, yb in val_loader:
        outputs = model(xb)
        loss = criterion(outputs, yb)
        val_loss_total += loss.item() * yb.size(0)
        val_samples += yb.size(0)
val_loss = val_loss_total / val_samples

result = f"Training accuracy: {train_acc*100:.1f}%, Validation accuracy: {best_val_acc*100:.1f}%, Validation loss: {val_loss:.3f}"
print(result)
Added dropout layer with 0.3 dropout rate after RNN output to reduce overfitting.
Reduced hidden size from 32 to 16 to simplify the model.
Used Adam optimizer with learning rate 0.001 for stable training.
Implemented early stopping with patience of 5 epochs to prevent over-training.
Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 70%, Validation loss: 0.85

After: Training accuracy: 90.5%, Validation accuracy: 86.7%, Validation loss: 0.42

Adding dropout and simplifying the model helped reduce overfitting. Early stopping prevented the model from training too long. This improved validation accuracy and lowered validation loss, showing better generalization.
Bonus Experiment
Try using an LSTM instead of a simple RNN and compare the validation accuracy and loss.
💡 Hint
Replace nn.RNN with nn.LSTM in the model and keep other settings the same to see if the model learns better sequence patterns.

Practice

(1/5)
1. What is the main goal of sequence classification in PyTorch?
easy
A. To assign a label to the entire input sequence
B. To predict the next item in the sequence
C. To label each item in the sequence separately
D. To generate a new sequence from the input

Solution

  1. Step 1: Understand sequence classification

    Sequence classification means giving one label to the whole sequence, not to individual items.
  2. Step 2: Compare options

    Only To assign a label to the entire input sequence describes labeling the entire sequence, which matches the goal of sequence classification.
  3. Final Answer:

    To assign a label to the entire input sequence -> Option A
  4. Quick Check:

    Sequence classification = label whole sequence [OK]
Hint: Sequence classification labels the whole sequence, not parts [OK]
Common Mistakes:
  • Confusing sequence classification with sequence labeling
  • Thinking it predicts next sequence item
  • Assuming it generates new sequences
2. Which PyTorch module is commonly used to process sequences step-by-step for classification?
easy
A. torch.nn.Conv2d
B. torch.nn.Linear
C. torch.nn.RNN
D. torch.nn.BatchNorm1d

Solution

  1. Step 1: Identify sequence processing modules

    RNN (Recurrent Neural Network) modules process sequences step-by-step, capturing order.
  2. Step 2: Match options to sequence processing

    Only torch.nn.RNN is designed for sequential data; others serve different purposes.
  3. Final Answer:

    torch.nn.RNN -> Option C
  4. Quick Check:

    RNN processes sequences stepwise [OK]
Hint: RNN modules handle sequences stepwise in PyTorch [OK]
Common Mistakes:
  • Choosing Linear which is for fixed-size input
  • Selecting Conv2d meant for images
  • Picking BatchNorm which normalizes features
3. Given this PyTorch code snippet for sequence classification, what is the shape of the output tensor?
rnn = torch.nn.RNN(input_size=10, hidden_size=20, batch_first=True)
inputs = torch.randn(5, 7, 10)  # batch=5, seq_len=7, features=10
output, hn = rnn(inputs)
final_output = hn.squeeze(0)
medium
A. [5, 20]
B. [5, 7, 20]
C. [7, 20]
D. [5, 10]

Solution

  1. Step 1: Understand RNN output shapes

    Output shape is (batch, seq_len, hidden_size) = (5,7,20). hn shape is (num_layers, batch, hidden_size) = (1,5,20).
  2. Step 2: Analyze final_output shape

    hn.squeeze(0) removes the first dimension (num_layers), resulting in (5,20).
  3. Final Answer:

    [5, 20] -> Option A
  4. Quick Check:

    hn.squeeze(0) shape = [batch, hidden_size] = [5, 20] [OK]
Hint: Squeeze removes layer dim; output shape is batch x hidden size [OK]
Common Mistakes:
  • Confusing output and hn shapes
  • Not squeezing the layer dimension
  • Mixing sequence length with batch size
4. Identify the error in this PyTorch sequence classification model code:
class SeqClassifier(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.rnn = torch.nn.RNN(10, 20, batch_first=True)
        self.fc = torch.nn.Linear(10, 2)
    def forward(self, x):
        out, hn = self.rnn(x)
        out = self.fc(hn.squeeze(0))
        return out
medium
A. The forward method should return hn, not out
B. The RNN input size should be 2, not 10
C. The squeeze(0) should be applied to out, not hn
D. The Linear layer input size should be 20, not 10

Solution

  1. Step 1: Check Linear layer input size

    The RNN hidden size is 20, so hn has shape (batch, 20). The Linear layer expects input size 10, which is incorrect.
  2. Step 2: Correct Linear input size

    Linear layer input size must match hidden size 20 to process hn correctly.
  3. Final Answer:

    The Linear layer input size should be 20, not 10 -> Option D
  4. Quick Check:

    Linear input size = hidden size = 20 [OK]
Hint: Linear input size must match RNN hidden size [OK]
Common Mistakes:
  • Mismatching Linear input size with hidden size
  • Applying squeeze to wrong tensor
  • Returning wrong tensor from forward
5. You want to classify sequences of varying lengths using an RNN in PyTorch. Which approach correctly handles different sequence lengths during training?
hard
A. Truncate all sequences to the shortest length without padding
B. Pad sequences to the same length and use pack_padded_sequence before RNN
C. Feed sequences directly without padding or packing
D. Use a Linear layer instead of RNN to avoid sequence length issues

Solution

  1. Step 1: Understand variable-length sequence handling

    Sequences must be padded to the same length for batch processing, then packed to ignore padding during RNN.
  2. Step 2: Evaluate options

    Pad sequences to the same length and use pack_padded_sequence before RNN uses padding plus pack_padded_sequence, the correct PyTorch method to handle varying lengths efficiently.
  3. Final Answer:

    Pad sequences to the same length and use pack_padded_sequence before RNN -> Option B
  4. Quick Check:

    Use padding + pack_padded_sequence for variable lengths [OK]
Hint: Pad then pack sequences to handle varying lengths in RNN [OK]
Common Mistakes:
  • Ignoring padding and feeding raw sequences
  • Truncating sequences losing data
  • Replacing RNN with Linear layer incorrectly