0
0
NLPml~8 mins

Logistic regression for text in NLP - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Logistic regression for text
Which metric matters for Logistic Regression on Text and WHY

When using logistic regression to classify text, the key metrics are Precision, Recall, and F1-score. These metrics help us understand how well the model identifies the correct categories.

Precision tells us how many of the texts labeled as positive are actually positive. This is important when false alarms are costly.

Recall tells us how many of the actual positive texts the model found. This matters when missing a positive case is bad.

F1-score balances precision and recall, giving a single number to compare models.

Accuracy alone can be misleading if the text classes are unbalanced (one class much bigger than the other).

Confusion Matrix Example
       Predicted
       Pos   Neg
    P  80    20   Actual Positive
    N  10    90   Actual Negative
    

Here,

  • TP (True Positive) = 80 (correct positive predictions)
  • FN (False Negative) = 20 (missed positives)
  • FP (False Positive) = 10 (wrongly labeled positive)
  • TN (True Negative) = 90 (correct negative predictions)

From this, we calculate:

  • Precision = 80 / (80 + 10) = 0.89
  • Recall = 80 / (80 + 20) = 0.80
  • F1-score = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84
Precision vs Recall Tradeoff with Text Examples

Imagine a spam filter using logistic regression on emails:

  • High Precision: Few good emails are marked as spam. This avoids losing important messages.
  • High Recall: Most spam emails are caught, but some good emails might be wrongly blocked.

Depending on what matters more, you adjust the model threshold to favor precision or recall.

For example, if missing spam is worse, prioritize recall. If blocking good emails is worse, prioritize precision.

Good vs Bad Metric Values for Logistic Regression on Text

Good:

  • Precision and recall both above 0.80, showing balanced and reliable predictions.
  • F1-score close to or above 0.80, indicating good overall performance.
  • Confusion matrix numbers consistent and balanced.

Bad:

  • High accuracy but very low recall (e.g., 98% accuracy but 10% recall) means the model misses most positive texts.
  • Precision very low (e.g., 0.3) means many false alarms.
  • Confusion matrix numbers that don't add up or show imbalance.
Common Pitfalls in Metrics for Logistic Regression on Text
  • Accuracy Paradox: High accuracy can hide poor performance if classes are imbalanced.
  • Data Leakage: If test data leaks into training, metrics look unrealistically good.
  • Overfitting: Very high training metrics but poor test metrics show the model memorizes instead of learning.
  • Ignoring Class Imbalance: Not using precision and recall when one class is rare leads to wrong conclusions.
Self Check

Your logistic regression model for spam detection has 98% accuracy but only 12% recall on spam emails. Is it good for production? Why or why not?

Answer: No, it is not good. The model misses 88% of spam emails (low recall), so many spam messages get through. High accuracy is misleading because most emails are not spam, so the model just predicts non-spam most of the time.

Key Result
Precision, recall, and F1-score are key to evaluate logistic regression on text, especially with imbalanced classes.