Bird
Raised Fist0
PyTorchml~8 mins

Sequence classification in PyTorch - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Sequence classification
Which metric matters for Sequence Classification and WHY

For sequence classification, common metrics include accuracy, precision, recall, and F1 score. Accuracy tells us how many sequences were correctly labeled overall. Precision shows how many predicted positive sequences were actually positive. Recall tells us how many actual positive sequences were found by the model. F1 score balances precision and recall, which is important when classes are uneven or mistakes have different costs.

Choosing the right metric depends on the task. For example, if missing a positive sequence is costly (like detecting spam or disease), recall is more important. If false alarms are costly, precision matters more.

Confusion Matrix for Sequence Classification
      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP)  | False Positive (FP) |
      | False Negative (FN) | True Negative (TN)  |

      Example:
      TP = 40, FP = 10, TN = 45, FN = 5

      Total samples = TP + FP + TN + FN = 40 + 10 + 45 + 5 = 100
    

From this matrix, we calculate:

  • Precision = TP / (TP + FP) = 40 / (40 + 10) = 0.8
  • Recall = TP / (TP + FN) = 40 / (40 + 5) = 0.89
  • F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.84
  • Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85
Precision vs Recall Tradeoff with Examples

Imagine a model that classifies sequences as "spam" or "not spam" emails.

  • High Precision, Low Recall: The model flags only very sure spam emails. Few false alarms, but it misses many spam emails. Good if you hate false spam labels.
  • High Recall, Low Precision: The model flags almost all spam emails but also many good emails by mistake. Good if you want to catch all spam but can tolerate some false alarms.

Choosing depends on what is worse: missing spam or wrongly marking good emails as spam.

What "Good" vs "Bad" Metric Values Look Like

For sequence classification:

  • Good: Accuracy above 85%, Precision and Recall above 80%, balanced F1 score above 0.8.
  • Bad: Accuracy near random chance (e.g., 50% for two classes), very low precision or recall (below 50%), or very unbalanced metrics (e.g., 90% precision but 10% recall).

Balanced metrics mean the model is reliable both in finding positives and avoiding false alarms.

Common Pitfalls in Metrics for Sequence Classification
  • Accuracy Paradox: High accuracy can be misleading if classes are imbalanced. For example, if 95% sequences are negative, a model always predicting negative gets 95% accuracy but is useless.
  • Data Leakage: If training data leaks into test data, metrics look unrealistically high.
  • Overfitting Indicators: Very high training accuracy but low test accuracy means the model memorizes training sequences but fails on new ones.
  • Ignoring Class Imbalance: Not using precision, recall, or F1 when classes are uneven can hide poor performance on minority classes.
Self-Check Question

Your sequence classification model has 98% accuracy but only 12% recall on the positive class (e.g., detecting spam). Is this model good for production? Why or why not?

Answer: No, it is not good. The high accuracy is likely because most sequences are negative, so the model predicts negative most of the time. The very low recall means it misses almost all positive sequences, which defeats the purpose of detecting them.

Key Result
For sequence classification, balanced precision, recall, and F1 score matter most to ensure the model finds positives without many false alarms.

Practice

(1/5)
1. What is the main goal of sequence classification in PyTorch?
easy
A. To assign a label to the entire input sequence
B. To predict the next item in the sequence
C. To label each item in the sequence separately
D. To generate a new sequence from the input

Solution

  1. Step 1: Understand sequence classification

    Sequence classification means giving one label to the whole sequence, not to individual items.
  2. Step 2: Compare options

    Only To assign a label to the entire input sequence describes labeling the entire sequence, which matches the goal of sequence classification.
  3. Final Answer:

    To assign a label to the entire input sequence -> Option A
  4. Quick Check:

    Sequence classification = label whole sequence [OK]
Hint: Sequence classification labels the whole sequence, not parts [OK]
Common Mistakes:
  • Confusing sequence classification with sequence labeling
  • Thinking it predicts next sequence item
  • Assuming it generates new sequences
2. Which PyTorch module is commonly used to process sequences step-by-step for classification?
easy
A. torch.nn.Conv2d
B. torch.nn.Linear
C. torch.nn.RNN
D. torch.nn.BatchNorm1d

Solution

  1. Step 1: Identify sequence processing modules

    RNN (Recurrent Neural Network) modules process sequences step-by-step, capturing order.
  2. Step 2: Match options to sequence processing

    Only torch.nn.RNN is designed for sequential data; others serve different purposes.
  3. Final Answer:

    torch.nn.RNN -> Option C
  4. Quick Check:

    RNN processes sequences stepwise [OK]
Hint: RNN modules handle sequences stepwise in PyTorch [OK]
Common Mistakes:
  • Choosing Linear which is for fixed-size input
  • Selecting Conv2d meant for images
  • Picking BatchNorm which normalizes features
3. Given this PyTorch code snippet for sequence classification, what is the shape of the output tensor?
rnn = torch.nn.RNN(input_size=10, hidden_size=20, batch_first=True)
inputs = torch.randn(5, 7, 10)  # batch=5, seq_len=7, features=10
output, hn = rnn(inputs)
final_output = hn.squeeze(0)
medium
A. [5, 20]
B. [5, 7, 20]
C. [7, 20]
D. [5, 10]

Solution

  1. Step 1: Understand RNN output shapes

    Output shape is (batch, seq_len, hidden_size) = (5,7,20). hn shape is (num_layers, batch, hidden_size) = (1,5,20).
  2. Step 2: Analyze final_output shape

    hn.squeeze(0) removes the first dimension (num_layers), resulting in (5,20).
  3. Final Answer:

    [5, 20] -> Option A
  4. Quick Check:

    hn.squeeze(0) shape = [batch, hidden_size] = [5, 20] [OK]
Hint: Squeeze removes layer dim; output shape is batch x hidden size [OK]
Common Mistakes:
  • Confusing output and hn shapes
  • Not squeezing the layer dimension
  • Mixing sequence length with batch size
4. Identify the error in this PyTorch sequence classification model code:
class SeqClassifier(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.rnn = torch.nn.RNN(10, 20, batch_first=True)
        self.fc = torch.nn.Linear(10, 2)
    def forward(self, x):
        out, hn = self.rnn(x)
        out = self.fc(hn.squeeze(0))
        return out
medium
A. The forward method should return hn, not out
B. The RNN input size should be 2, not 10
C. The squeeze(0) should be applied to out, not hn
D. The Linear layer input size should be 20, not 10

Solution

  1. Step 1: Check Linear layer input size

    The RNN hidden size is 20, so hn has shape (batch, 20). The Linear layer expects input size 10, which is incorrect.
  2. Step 2: Correct Linear input size

    Linear layer input size must match hidden size 20 to process hn correctly.
  3. Final Answer:

    The Linear layer input size should be 20, not 10 -> Option D
  4. Quick Check:

    Linear input size = hidden size = 20 [OK]
Hint: Linear input size must match RNN hidden size [OK]
Common Mistakes:
  • Mismatching Linear input size with hidden size
  • Applying squeeze to wrong tensor
  • Returning wrong tensor from forward
5. You want to classify sequences of varying lengths using an RNN in PyTorch. Which approach correctly handles different sequence lengths during training?
hard
A. Truncate all sequences to the shortest length without padding
B. Pad sequences to the same length and use pack_padded_sequence before RNN
C. Feed sequences directly without padding or packing
D. Use a Linear layer instead of RNN to avoid sequence length issues

Solution

  1. Step 1: Understand variable-length sequence handling

    Sequences must be padded to the same length for batch processing, then packed to ignore padding during RNN.
  2. Step 2: Evaluate options

    Pad sequences to the same length and use pack_padded_sequence before RNN uses padding plus pack_padded_sequence, the correct PyTorch method to handle varying lengths efficiently.
  3. Final Answer:

    Pad sequences to the same length and use pack_padded_sequence before RNN -> Option B
  4. Quick Check:

    Use padding + pack_padded_sequence for variable lengths [OK]
Hint: Pad then pack sequences to handle varying lengths in RNN [OK]
Common Mistakes:
  • Ignoring padding and feeding raw sequences
  • Truncating sequences losing data
  • Replacing RNN with Linear layer incorrectly