When using one-vs-rest (OvR) or one-vs-one (OvO) strategies for multi-class classification, metrics like accuracy, precision, recall, and F1-score matter. This is because these strategies break a multi-class problem into multiple binary problems. We need to measure how well each binary classifier performs and then combine results. Macro-averaged precision, recall, and F1-score help us understand performance equally across all classes, especially if classes are imbalanced.
One-vs-rest and one-vs-one strategies in ML Python - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
For OvR, each class has its own binary confusion matrix. For example, with 3 classes (A, B, C), the OvR confusion matrix for class A looks like:
Predicted A | Not A
-----------------------
A | TP | FN
Not A | FP | TN
For OvO, each pair of classes has a binary confusion matrix. For classes A and B:
Predicted A | Predicted B
---------------------------
A | TP | FN
B | FP | TN
All these binary results combine to decide the final multi-class prediction.
In OvR or OvO, each binary classifier faces a tradeoff between precision and recall:
- Precision: How many predicted positives are actually correct? Important if false alarms are costly.
- Recall: How many actual positives are found? Important if missing a class is costly.
Example: For a disease detection with multiple diseases (classes), using OvR:
- If you want to avoid wrongly labeling healthy people as sick (false positives), focus on high precision.
- If you want to catch all sick people (true positives), focus on high recall.
Choosing OvR or OvO affects how these tradeoffs appear because OvO compares pairs directly, often improving precision but increasing complexity.
Good metrics for OvR/OvO multi-class classification:
- Accuracy: High (close to 1.0) means most samples are correctly classified.
- Macro F1-score: High (above 0.8) means balanced performance across all classes.
- Precision and Recall: Both should be reasonably high (above 0.7) for each class to avoid bias.
Bad metrics:
- High accuracy but low recall on some classes means the model misses many samples of those classes.
- High precision but low recall means the model is too strict and misses positives.
- Very low F1-score (below 0.5) indicates poor balance and unreliable classification.
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, if one class dominates, predicting it always yields high accuracy but poor real performance.
- Data leakage: If test data leaks into training, metrics look unrealistically good.
- Overfitting: Very high training metrics but low test metrics show the model memorizes training data but fails to generalize.
- Ignoring class imbalance: Not using macro-averaged metrics can hide poor performance on minority classes.
No, this model is not good for fraud detection. Although 98% accuracy sounds high, the recall of 12% means it only finds 12% of actual fraud cases. This is bad because missing fraud is costly. The model likely predicts most samples as non-fraud, inflating accuracy but failing its main goal. Improving recall is critical here.
Practice
one-vs-rest strategy in multi-class classification?Solution
Step 1: Understand one-vs-rest approach
One-vs-rest means creating one model per class. Each model learns to separate its class from all other classes combined.Step 2: Compare with other options
One-vs-one trains models for every pair, not per class. Single model for all classes is not one-vs-rest. Training only on frequent classes is unrelated.Final Answer:
Train one model per class to separate that class from all others combined. -> Option AQuick Check:
One-vs-rest = One model per class [OK]
- Confusing one-vs-rest with one-vs-one
- Thinking one-vs-rest uses one model for all classes
- Assuming one-vs-rest trains only on frequent classes
one-vs-one strategy for a problem with 4 classes?Solution
Step 1: Calculate number of pairs for 4 classes
One-vs-one trains a model for every pair of classes. Number of pairs = 4 choose 2 = 4*3/2 = 6.Step 2: Verify other options
4 models is one per class (one-vs-rest). 1 model is single multi-class. 8 models is incorrect count.Final Answer:
6 models -> Option BQuick Check:
Pairs for 4 classes = 6 [OK]
- Using number of classes instead of pairs
- Confusing one-vs-one with one-vs-rest counts
- Calculating pairs incorrectly
Solution
Step 1: Count models in one-vs-rest for 3 classes
One-vs-rest trains one model per class, so 3 models total.Step 2: Understand model learning in one-vs-rest
Each model learns to separate its class from all other classes combined (not just one other class).Final Answer:
3 models; each separates one class from the other two combined. -> Option DQuick Check:
One-vs-rest with 3 classes = 3 models [OK]
- Thinking one-vs-rest trains models per pair
- Assuming only one model is trained
- Confusing one-vs-rest with one-vs-one
Solution
Step 1: Calculate expected number of one-vs-one models for 5 classes
Number of pairs = 5 choose 2 = 5*4/2 = 10 models expected.Step 2: Identify mistake from training only 4 models
Training only 4 models means some pairs were missed. Possibly forgot to train all pairs.Final Answer:
You forgot to train models for all pairs; should be 10 models. -> Option CQuick Check:
One-vs-one for 5 classes = 10 models [OK]
- Counting models as number of classes
- Confusing one-vs-one with one-vs-rest
- Training incomplete pairs
Solution
Step 1: Understand imbalance effect on one-vs-rest
One-vs-rest models separate one class vs all others combined, which can cause imbalance if one class is small and others are large.Step 2: Understand one-vs-one advantage
One-vs-one trains models on pairs of classes, so imbalance is less severe per model, improving learning on minority classes.Step 3: Evaluate other options
Single multi-class model may struggle with imbalance. Training only on largest class ignores others.Final Answer:
One-vs-one, because training on pairs reduces imbalance impact between classes. -> Option AQuick Check:
One-vs-one handles imbalance better [OK]
- Assuming one-vs-rest always better for imbalance
- Ignoring imbalance effects on combined classes
- Choosing single model ignoring class distribution
