How to Use Confusion Matrix in sklearn with Python
Use
confusion_matrix from sklearn.metrics by passing true labels and predicted labels as arguments. It returns a matrix showing counts of correct and incorrect predictions for each class.Syntax
The confusion_matrix function is used to compute the confusion matrix to evaluate classification accuracy.
- y_true: The true labels of your data.
- y_pred: The predicted labels from your model.
- labels (optional): List of labels to index the matrix. Useful to control order.
- normalize (optional): Can be
None,'true','pred', or'all'to normalize the matrix.
python
from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_true, y_pred, labels=None, normalize=None)
Example
This example shows how to create a confusion matrix for a simple classification problem with true and predicted labels.
python
from sklearn.metrics import confusion_matrix # True labels y_true = [0, 1, 2, 2, 0, 1] # Predicted labels y_pred = [0, 2, 2, 2, 0, 0] cm = confusion_matrix(y_true, y_pred) print(cm)
Output
[[2 0 0]
[1 0 1]
[0 0 2]]
Common Pitfalls
Common mistakes when using confusion_matrix include:
- Mixing up
y_trueandy_predwhich leads to incorrect matrix interpretation. - Not specifying
labelswhen your classes are not sorted or missing some labels, causing unexpected matrix shape. - Ignoring normalization when comparing models with different class distributions.
python
from sklearn.metrics import confusion_matrix # Wrong: swapped arguments cm_wrong = confusion_matrix(y_pred, y_true) print('Wrong matrix:\n', cm_wrong) # Right: correct order cm_right = confusion_matrix(y_true, y_pred) print('Right matrix:\n', cm_right)
Output
Wrong matrix:
[[2 1 0]
[0 0 0]
[0 1 2]]
Right matrix:
[[2 0 0]
[1 0 1]
[0 0 2]]
Quick Reference
| Parameter | Description |
|---|---|
| y_true | Array of true class labels |
| y_pred | Array of predicted class labels |
| labels | List of labels to index matrix (optional) |
| normalize | 'true', 'pred', 'all' or None for normalization |
Key Takeaways
Always pass true labels first, then predicted labels to confusion_matrix.
Use the labels parameter to control class order and include all classes.
Normalize the confusion matrix to compare models fairly across class imbalances.
Interpret the matrix rows as true classes and columns as predicted classes.
Confusion matrix helps identify types of classification errors clearly.