0
0
NLPml~12 mins

Lemmatization in spaCy in NLP - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Lemmatization in spaCy

This pipeline shows how spaCy processes text to find the base form of words, called lemmas. It starts with raw text, breaks it into words, and then finds each word's lemma to help understand the meaning better.

Data Flow - 4 Stages
1Raw Text Input
1 sentence (string)Input raw sentence as text1 sentence (string)
"The cats are running quickly."
2Tokenization
1 sentence (string)Split sentence into words (tokens)6 tokens (words)
["The", "cats", "are", "running", "quickly", "."]
3Part-of-Speech Tagging
6 tokensAssign word types (noun, verb, etc.)6 tokens with POS tags
[('The', 'DET'), ('cats', 'NOUN'), ('are', 'AUX'), ('running', 'VERB'), ('quickly', 'ADV'), ('.', 'PUNCT')]
4Lemmatization
6 tokens with POS tagsFind base form (lemma) of each token6 lemmas
["the", "cat", "be", "run", "quickly", "."]
Training Trace - Epoch by Epoch
Loss
0.5 |****
0.4 |***
0.3 |**
0.2 |*
0.1 | 
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.450.70Initial training with moderate loss and accuracy.
20.300.82Loss decreased and accuracy improved as model learned.
30.200.90Model shows good convergence with higher accuracy.
40.150.93Further improvement, loss lowering steadily.
50.120.95Training converged with high accuracy and low loss.
Prediction Trace - 3 Layers
Layer 1: Tokenization
Layer 2: POS Tagging
Layer 3: Lemmatization
Model Quiz - 3 Questions
Test your understanding
What is the main purpose of lemmatization in spaCy?
ATo find the base form of words
BTo split sentences into words
CTo assign part-of-speech tags
DTo translate text into another language
Key Insight
Lemmatization helps reduce different word forms to a common base, improving text understanding. The POS tags guide the model to choose the correct lemma. Training shows steady improvement, meaning the model learns to lemmatize accurately.