0
0
TensorFlowml~8 mins

Confusion matrix visualization in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Confusion matrix visualization
Which metric matters and WHY

When we want to see how well a model guesses categories, the confusion matrix helps us understand the details. It shows how many times the model got things right or wrong for each category. This helps us pick the right metric like accuracy, precision, or recall depending on what matters most for our problem.

Confusion matrix visualization
          Predicted
          0     1
Actual 0 | 50 | 10 |
       1 |  5 | 35 |

Where:
- 50 = True Negative (TN)
- 10 = False Positive (FP)
- 5  = False Negative (FN)
- 35 = True Positive (TP)

Total samples = 50 + 10 + 5 + 35 = 100

This table shows how many times the model guessed each class correctly or incorrectly.

Precision vs Recall tradeoff with examples

Precision tells us how many of the items the model said were positive actually are positive. For example, in spam detection, high precision means few good emails are wrongly marked as spam.

Recall tells us how many of the actual positive items the model found. For example, in cancer detection, high recall means the model finds most cancer cases, even if it sometimes makes mistakes.

Improving one often lowers the other, so we choose based on what is more important: avoiding false alarms or missing real cases.

What good vs bad metric values look like

For the confusion matrix above:

  • Precision = TP / (TP + FP) = 35 / (35 + 10) = 0.78 (78%)
  • Recall = TP / (TP + FN) = 35 / (35 + 5) = 0.88 (88%)
  • Accuracy = (TP + TN) / Total = (35 + 50) / 100 = 0.85 (85%)

Good: Precision and recall above 80% means the model is reliable in both finding positives and not making many mistakes.

Bad: Precision or recall below 50% means the model often makes wrong guesses or misses many positives.

Common pitfalls in metrics
  • Accuracy paradox: High accuracy can be misleading if classes are unbalanced. For example, if 95% of data is negative, a model that always guesses negative has 95% accuracy but is useless.
  • Data leakage: When the model accidentally learns from future or test data, metrics look better but the model fails in real use.
  • Overfitting: Very high training metrics but poor test metrics mean the model memorizes training data and won't generalize.
Self-check question

Your model has 98% accuracy but only 12% recall on fraud cases. Is it good for production?

Answer: No. The model misses 88% of fraud cases (low recall), which is dangerous. Even with high accuracy, it fails to catch most frauds, so it is not good for production.

Key Result
Confusion matrix helps visualize true/false positives and negatives to choose the right metric like precision or recall based on problem needs.