0
0
NLPml~8 mins

Naive Bayes for text in NLP - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Naive Bayes for text
Which metric matters for Naive Bayes text classification and WHY

For text classification using Naive Bayes, accuracy, precision, and recall are important. Accuracy shows overall correct predictions. Precision tells us how many predicted positive texts are truly positive. Recall shows how many actual positive texts were found. We choose metrics based on the task. For spam detection, high precision avoids marking good emails as spam. For detecting harmful content, high recall avoids missing bad texts.

Confusion matrix example
      Actual \ Predicted | Positive | Negative
      -------------------|----------|---------
      Positive           |    80    |   20
      Negative           |    10    |   90
    

Here, TP=80, FN=20, FP=10, TN=90. Total samples = 200.

Precision = 80 / (80 + 10) = 0.89

Recall = 80 / (80 + 20) = 0.80

Accuracy = (80 + 90) / 200 = 0.85

Precision vs Recall tradeoff with examples

Imagine a spam filter:

  • High precision means most emails marked as spam really are spam. This avoids losing good emails.
  • High recall means catching most spam emails, but might mark some good emails wrongly.

For harmful content detection:

  • High recall is key to catch all harmful texts, even if some good texts are flagged.
  • High precision reduces false alarms but might miss some harmful texts.

Naive Bayes can be tuned to balance these by adjusting thresholds.

What good vs bad metric values look like

Good metrics for Naive Bayes text classification:

  • Accuracy above 80% on balanced data
  • Precision and recall both above 75%
  • F1 score (balance of precision and recall) above 0.75

Bad metrics:

  • Accuracy near random guess (e.g., 50% for two classes)
  • Precision very low (e.g., 30%) means many false positives
  • Recall very low (e.g., 20%) means many missed positives
Common pitfalls in metrics for Naive Bayes text classification
  • Accuracy paradox: High accuracy can be misleading if classes are imbalanced (e.g., 95% accuracy but model always predicts majority class).
  • Data leakage: If test data leaks into training, metrics look unrealistically good.
  • Overfitting: Very high training accuracy but low test accuracy means model memorizes training data, not generalizing.
  • Ignoring class imbalance: Metrics like accuracy alone don't show if minority class is well detected.
Self-check question

Your Naive Bayes text classifier has 98% accuracy but only 12% recall on the positive class (e.g., spam). Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means it misses most positive cases (spam). Even though accuracy is high, the model mostly predicts negative class. This is bad if catching positives is important.

Key Result
Precision and recall are key to evaluate Naive Bayes text classifiers, balancing false alarms and missed detections.