PyTorchml~8 mins

Sequence classification in PyTorch - Model Metrics & Evaluation

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Metrics & Evaluation - Sequence classification

Which metric matters for Sequence Classification and WHY

For sequence classification, common metrics include accuracy, precision, recall, and F1 score. Accuracy tells us how many sequences were correctly labeled overall. Precision shows how many predicted positive sequences were actually positive. Recall tells us how many actual positive sequences were found by the model. F1 score balances precision and recall, which is important when classes are uneven or mistakes have different costs.

Choosing the right metric depends on the task. For example, if missing a positive sequence is costly (like detecting spam or disease), recall is more important. If false alarms are costly, precision matters more.

Confusion Matrix for Sequence Classification

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP)  | False Positive (FP) |
      | False Negative (FN) | True Negative (TN)  |

      Example:
      TP = 40, FP = 10, TN = 45, FN = 5

      Total samples = TP + FP + TN + FN = 40 + 10 + 45 + 5 = 100

From this matrix, we calculate:

Precision = TP / (TP + FP) = 40 / (40 + 10) = 0.8
Recall = TP / (TP + FN) = 40 / (40 + 5) = 0.89
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.84
Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85

Precision vs Recall Tradeoff with Examples

Imagine a model that classifies sequences as "spam" or "not spam" emails.

High Precision, Low Recall: The model flags only very sure spam emails. Few false alarms, but it misses many spam emails. Good if you hate false spam labels.
High Recall, Low Precision: The model flags almost all spam emails but also many good emails by mistake. Good if you want to catch all spam but can tolerate some false alarms.

Choosing depends on what is worse: missing spam or wrongly marking good emails as spam.

What "Good" vs "Bad" Metric Values Look Like

For sequence classification:

Good: Accuracy above 85%, Precision and Recall above 80%, balanced F1 score above 0.8.
Bad: Accuracy near random chance (e.g., 50% for two classes), very low precision or recall (below 50%), or very unbalanced metrics (e.g., 90% precision but 10% recall).

Balanced metrics mean the model is reliable both in finding positives and avoiding false alarms.

Common Pitfalls in Metrics for Sequence Classification

Accuracy Paradox: High accuracy can be misleading if classes are imbalanced. For example, if 95% sequences are negative, a model always predicting negative gets 95% accuracy but is useless.
Data Leakage: If training data leaks into test data, metrics look unrealistically high.
Overfitting Indicators: Very high training accuracy but low test accuracy means the model memorizes training sequences but fails on new ones.
Ignoring Class Imbalance: Not using precision, recall, or F1 when classes are uneven can hide poor performance on minority classes.

Self-Check Question

Your sequence classification model has 98% accuracy but only 12% recall on the positive class (e.g., detecting spam). Is this model good for production? Why or why not?

Answer: No, it is not good. The high accuracy is likely because most sequences are negative, so the model predicts negative most of the time. The very low recall means it misses almost all positive sequences, which defeats the purpose of detecting them.

Key Result

For sequence classification, balanced precision, recall, and F1 score matter most to ensure the model finds positives without many false alarms.

Practice

(1/5)

1. What is the main goal of sequence classification in PyTorch?

easy

A. To assign a label to the entire input sequence

B. To predict the next item in the sequence

C. To label each item in the sequence separately

D. To generate a new sequence from the input

Sequence classification in PyTorch - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand sequence classification

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Identify sequence processing modules

Step 2: Match options to sequence processing

Final Answer:

Quick Check:

Solution

Step 1: Understand RNN output shapes

Step 2: Analyze final_output shape

Final Answer:

Quick Check:

Solution

Step 1: Check Linear layer input size

Step 2: Correct Linear input size

Final Answer:

Quick Check:

Solution

Step 1: Understand variable-length sequence handling

Step 2: Evaluate options

Final Answer:

Quick Check: