POS Tagging with spaCy in NLP: Simple Guide and Example
To do
POS tagging with spaCy in NLP, load a language model like en_core_web_sm, process your text with nlp(), and access the pos_ attribute of each token for its part-of-speech tag. This gives you easy access to grammatical tags for each word in your text.Syntax
Use spacy.load() to load a language model. Then call nlp(text) to process your text. Each word becomes a Token object. Access token.pos_ for the POS tag as a readable string.
spacy.load('en_core_web_sm'): loads English modelnlp(text): processes text into tokenstoken.pos_: gets POS tag of a token
python
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp('This is a sentence.') for token in doc: print(token.text, token.pos_)
Output
This DET
is AUX
a DET
sentence NOUN
. PUNCT
Example
This example shows how to load spaCy's English model, process a sentence, and print each word with its POS tag.
python
import spacy # Load the small English model nlp = spacy.load('en_core_web_sm') # Text to analyze text = 'SpaCy is a great library for NLP tasks.' # Process the text doc = nlp(text) # Print each token with its POS tag for token in doc: print(f'{token.text}: {token.pos_}')
Output
SpaCy: PROPN
is: AUX
a: DET
great: ADJ
library: NOUN
for: ADP
NLP: PROPN
tasks: NOUN
.: PUNCT
Common Pitfalls
Common mistakes include not loading a model before processing text, which causes errors, or confusing pos_ with tag_. The pos_ attribute gives coarse-grained POS tags (like NOUN, VERB), while tag_ gives fine-grained tags (like NN, VBD).
Also, forgetting to install the model with python -m spacy download en_core_web_sm leads to loading errors.
python
import spacy # Wrong: not loading model # nlp = None # doc = nlp('Test sentence') # This will raise an error # Right way: nlp = spacy.load('en_core_web_sm') doc = nlp('Test sentence') for token in doc: print(token.text, token.pos_)
Output
Test NOUN
sentence NOUN
Quick Reference
| Attribute | Description | Example Value |
|---|---|---|
| nlp = spacy.load('en_core_web_sm') | Load English model | nlp object |
| doc = nlp(text) | Process text into tokens | doc object |
| token.text | Original word text | 'SpaCy' |
| token.pos_ | Coarse POS tag | 'NOUN', 'VERB', 'ADJ' |
| token.tag_ | Fine-grained POS tag | 'NN', 'VBD', 'JJ' |
Key Takeaways
Always load a spaCy language model before processing text for POS tagging.
Use token.pos_ to get easy-to-understand part-of-speech tags for each word.
Install models with 'python -m spacy download en_core_web_sm' if not already installed.
Remember pos_ gives coarse tags; tag_ gives more detailed POS tags.
Process text with nlp(text) to get tokens for POS tagging.
