ML Pythonml~8 mins

Multi-class classification in ML Python - Model Metrics & Evaluation

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Metrics & Evaluation - Multi-class classification

Which metric matters for Multi-class classification and WHY

In multi-class classification, the model predicts one label out of many possible classes. We want to know how often it picks the right class. Accuracy is a simple metric that shows the percentage of correct predictions overall.

However, accuracy alone can hide problems if some classes appear much more than others. So, we also use Precision, Recall, and F1-score for each class. These tell us how well the model finds each class without mixing them up.

For example, if the model often confuses class A with class B, precision and recall for those classes will be low. This helps us understand where the model struggles.

Confusion matrix for Multi-class classification

A confusion matrix shows how predictions match true classes. For 3 classes (A, B, C), it looks like this:

          Predicted
          A   B   C
    True A 50  2   3
         B  4 45   1
         C  5  3  40

Here, 50 times the model correctly predicted A when the true class was A (True Positives for A). The 2 and 3 are mistakes where the true class was A but predicted as B or C.

We use these numbers to calculate precision, recall, and F1 for each class.

Precision vs Recall tradeoff with examples

In multi-class tasks, precision and recall help us understand errors better:

Precision for a class tells us: When the model says this class, how often is it right?
Recall for a class tells us: Of all true examples of this class, how many did the model find?

Example: Imagine a model classifying animals: cat, dog, rabbit.

If the model has high precision but low recall for "rabbit", it means when it says "rabbit" it is usually correct, but it misses many rabbits (calls them other animals).
If it has high recall but low precision for "rabbit", it finds most rabbits but often mistakes other animals as rabbits.

Depending on the goal, you might want to improve precision or recall for certain classes.

What "good" vs "bad" metric values look like for Multi-class classification

Good metrics:

Accuracy above 80% usually means the model predicts well overall.
Precision and recall above 75% for each class show balanced performance.
F1-score close to precision and recall means the model is consistent.

Bad metrics:

Accuracy near random chance (e.g., 33% for 3 classes) means poor learning.
Very low precision or recall for some classes means the model confuses those classes badly.
Big gaps between precision and recall suggest the model is biased or missing examples.

Common pitfalls in Multi-class classification metrics

Accuracy paradox: High accuracy can hide poor performance on rare classes.
Ignoring class imbalance: If some classes are rare, metrics should be checked per class, not just overall.
Data leakage: If test data leaks into training, metrics look too good but model fails in real life.
Overfitting: Very high training accuracy but low test accuracy means the model memorizes training data, not generalizing.

Self-check question

Your multi-class model has 98% accuracy but only 12% recall on one important class. Is it good for production?

Answer: No. The model misses most examples of that class, which can be critical depending on the use case. High overall accuracy hides this problem. You should improve recall for that class before using the model.

Key Result

In multi-class classification, accuracy shows overall correctness, but precision, recall, and F1 per class reveal detailed strengths and weaknesses.

Practice

(1/5)

1. What does multi-class classification mean in machine learning?

easy

A. Sorting data into only two groups

B. Sorting data into three or more groups

C. Predicting continuous numbers

D. Clustering data without labels

Multi-class classification in ML Python - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand classification types

Step 2: Match definition to options

Final Answer:

Quick Check:

Solution

Step 1: Check scikit-learn multi-class syntax

Step 2: Evaluate each option

Final Answer:

Quick Check:

Solution

Step 1: Understand predict output shape

Step 2: Check input data size

Final Answer:

Quick Check:

Solution

Step 1: Analyze error message

Step 2: Match cause to options

Final Answer:

Quick Check:

Solution

Step 1: Understand imbalance problem

Step 2: Evaluate options for imbalance handling

Final Answer:

Quick Check: