0
0
NLPml~8 mins

Lemmatization in spaCy in NLP - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Lemmatization in spaCy
Which metric matters for Lemmatization in spaCy and WHY

Lemmatization is about finding the base form of words. The key metric here is Accuracy, which measures how many words are correctly converted to their base forms out of all words processed. This matters because correct base forms help many language tasks like search, translation, and understanding.

Confusion matrix for Lemmatization

For lemmatization, a confusion matrix can show how many words were correctly lemmatized (True Positives) versus incorrectly lemmatized (False Positives and False Negatives). For example:

          | Predicted Correct | Predicted Incorrect
    ------|-------------------|-------------------
    Actual Correct   |        TP=85       |       FN=15       
    Actual Incorrect |        FP=10       |       TN=90       
    

Here, TP means words correctly lemmatized, FP means words wrongly lemmatized as correct, FN means words missed, and TN means words correctly identified as not needing change.

Tradeoff: Precision vs Recall in Lemmatization

Precision tells us how many of the words we labeled as correct base forms really are correct. Recall tells us how many of the actual base forms we found.

For example, if we want to avoid wrong base forms (high precision), we might miss some correct ones (lower recall). If we want to find all base forms (high recall), we might include some wrong ones (lower precision).

In lemmatization, usually high precision is preferred to avoid confusing the meaning, but recall should not be too low to keep usefulness.

Good vs Bad metric values for Lemmatization

Good: Accuracy above 90%, Precision and Recall balanced above 85%. This means most words are correctly lemmatized and few mistakes happen.

Bad: Accuracy below 70%, Precision or Recall very low (below 50%). This means many words are wrongly lemmatized or many base forms are missed, hurting downstream tasks.

Common pitfalls in Lemmatization metrics
  • Ignoring context: Some words need sentence context to lemmatize correctly. Metrics may look good on simple words but fail on complex sentences.
  • Data leakage: Testing on words seen during training inflates accuracy.
  • Overfitting: Model memorizes common words but fails on new words, causing poor real-world performance.
  • Accuracy paradox: High accuracy can happen if many words don't need lemmatization, hiding poor performance on actual changes.
Self-check question

Your lemmatization model has 98% accuracy but only 12% recall on rare verb forms. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy likely comes from many words that don't change, but the very low recall on rare verbs means the model misses most of these important cases. This hurts tasks relying on correct base forms of verbs.

Key Result
Accuracy is key for lemmatization, but balanced precision and recall ensure correct and complete base form detection.