Bird
Raised Fist0
NLPml~8 mins

Logistic regression for text in NLP - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Logistic regression for text
Which metric matters for Logistic Regression on Text and WHY

When using logistic regression to classify text, the key metrics are Precision, Recall, and F1-score. These metrics help us understand how well the model identifies the correct categories.

Precision tells us how many of the texts labeled as positive are actually positive. This is important when false alarms are costly.

Recall tells us how many of the actual positive texts the model found. This matters when missing a positive case is bad.

F1-score balances precision and recall, giving a single number to compare models.

Accuracy alone can be misleading if the text classes are unbalanced (one class much bigger than the other).

Confusion Matrix Example
       Predicted
       Pos   Neg
    P  80    20   Actual Positive
    N  10    90   Actual Negative
    

Here,

  • TP (True Positive) = 80 (correct positive predictions)
  • FN (False Negative) = 20 (missed positives)
  • FP (False Positive) = 10 (wrongly labeled positive)
  • TN (True Negative) = 90 (correct negative predictions)

From this, we calculate:

  • Precision = 80 / (80 + 10) = 0.89
  • Recall = 80 / (80 + 20) = 0.80
  • F1-score = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84
Precision vs Recall Tradeoff with Text Examples

Imagine a spam filter using logistic regression on emails:

  • High Precision: Few good emails are marked as spam. This avoids losing important messages.
  • High Recall: Most spam emails are caught, but some good emails might be wrongly blocked.

Depending on what matters more, you adjust the model threshold to favor precision or recall.

For example, if missing spam is worse, prioritize recall. If blocking good emails is worse, prioritize precision.

Good vs Bad Metric Values for Logistic Regression on Text

Good:

  • Precision and recall both above 0.80, showing balanced and reliable predictions.
  • F1-score close to or above 0.80, indicating good overall performance.
  • Confusion matrix numbers consistent and balanced.

Bad:

  • High accuracy but very low recall (e.g., 98% accuracy but 10% recall) means the model misses most positive texts.
  • Precision very low (e.g., 0.3) means many false alarms.
  • Confusion matrix numbers that don't add up or show imbalance.
Common Pitfalls in Metrics for Logistic Regression on Text
  • Accuracy Paradox: High accuracy can hide poor performance if classes are imbalanced.
  • Data Leakage: If test data leaks into training, metrics look unrealistically good.
  • Overfitting: Very high training metrics but poor test metrics show the model memorizes instead of learning.
  • Ignoring Class Imbalance: Not using precision and recall when one class is rare leads to wrong conclusions.
Self Check

Your logistic regression model for spam detection has 98% accuracy but only 12% recall on spam emails. Is it good for production? Why or why not?

Answer: No, it is not good. The model misses 88% of spam emails (low recall), so many spam messages get through. High accuracy is misleading because most emails are not spam, so the model just predicts non-spam most of the time.

Key Result
Precision, recall, and F1-score are key to evaluate logistic regression on text, especially with imbalanced classes.

Practice

(1/5)
1. What is the main purpose of logistic regression when applied to text data?
easy
A. To count the number of words in a text
B. To generate new text sentences
C. To classify text into categories like positive or negative
D. To translate text from one language to another

Solution

  1. Step 1: Understand logistic regression's role in text

    Logistic regression is a method used to classify data into categories based on input features.
  2. Step 2: Apply to text classification

    When applied to text, logistic regression predicts categories like positive or negative sentiment.
  3. Final Answer:

    To classify text into categories like positive or negative -> Option C
  4. Quick Check:

    Logistic regression classifies text [OK]
Hint: Logistic regression predicts categories, not generates text [OK]
Common Mistakes:
  • Confusing classification with text generation
  • Thinking logistic regression translates languages
  • Assuming it only counts words
2. Which Python library is commonly used to convert text into numbers before applying logistic regression?
easy
A. CountVectorizer
B. matplotlib
C. pandas
D. seaborn

Solution

  1. Step 1: Identify text to number conversion tools

    CountVectorizer is a tool that converts text into a matrix of token counts, suitable for models.
  2. Step 2: Match with logistic regression preprocessing

    Before logistic regression, text must be numeric; CountVectorizer is commonly used for this.
  3. Final Answer:

    CountVectorizer -> Option A
  4. Quick Check:

    Text to numbers = CountVectorizer [OK]
Hint: CountVectorizer turns words into numbers for models [OK]
Common Mistakes:
  • Choosing plotting libraries like matplotlib
  • Confusing data frame libraries like pandas
  • Selecting visualization tools like seaborn
3. What will be the output of this code snippet?
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['good movie', 'bad movie']
labels = [1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
model = LogisticRegression()
model.fit(X, labels)
pred = model.predict(vectorizer.transform(['good movie']))
print(pred)
medium
A. [0]
B. [1]
C. [1, 0]
D. Error: model not trained

Solution

  1. Step 1: Understand training data and labels

    Texts 'good movie' labeled 1 (positive), 'bad movie' labeled 0 (negative).
  2. Step 2: Predict on 'good movie'

    Model trained on these examples predicts label for 'good movie' as 1.
  3. Final Answer:

    [1] -> Option B
  4. Quick Check:

    Prediction for 'good movie' = 1 [OK]
Hint: Model predicts label matching training example [OK]
Common Mistakes:
  • Assuming prediction returns multiple labels
  • Thinking model is untrained causing error
  • Confusing label 0 and 1
4. Identify the error in this code snippet for logistic regression on text:
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer

texts = ['happy', 'sad']
labels = [1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
model = LogisticRegression()
model.fit(texts, labels)
medium
A. model.fit should use numeric features, not raw texts
B. CountVectorizer is not imported
C. fit_transform should be called on labels
D. Labels should be strings, not integers

Solution

  1. Step 1: Check input to model.fit

    Model expects numeric features, but code passes raw text strings.
  2. Step 2: Correct usage of vectorized data

    Must pass X (vectorized text) to model.fit, not original texts.
  3. Final Answer:

    model.fit should use numeric features, not raw texts -> Option A
  4. Quick Check:

    Model needs numbers, not raw text [OK]
Hint: Pass vectorized text, not raw strings, to model.fit [OK]
Common Mistakes:
  • Passing raw text instead of vectorized data
  • Confusing labels data type requirements
  • Ignoring import statements
5. You trained a logistic regression model on text data using CountVectorizer. When testing on new sentences, the model predicts only one class for all inputs. What is the best way to improve the model's performance?
hard
A. Change logistic regression to linear regression
B. Remove CountVectorizer and use raw text directly
C. Use fewer training examples to avoid overfitting
D. Increase the number of training examples and use n-grams in CountVectorizer

Solution

  1. Step 1: Understand cause of single-class prediction

    Model may be underfitting due to limited data or simple features.
  2. Step 2: Improve feature richness and data size

    Adding more training examples and using n-grams captures more context, improving model learning.
  3. Final Answer:

    Increase the number of training examples and use n-grams in CountVectorizer -> Option D
  4. Quick Check:

    More data + better features = better model [OK]
Hint: More data and richer features improve classification [OK]
Common Mistakes:
  • Removing vectorizer loses numeric input
  • Reducing data worsens model
  • Confusing regression types