POS Tagging in Python for NLP: Simple Guide with Examples
You can do POS tagging in Python using the
nltk library by first tokenizing text with word_tokenize and then applying pos_tag to get word tags. This process labels each word with its part of speech like noun, verb, or adjective.Syntax
POS tagging in Python with NLTK involves two main steps:
word_tokenize(text): splits the text into words (tokens).pos_tag(tokens): assigns a POS tag to each token.
The output is a list of tuples where each tuple contains a word and its POS tag.
python
from nltk import word_tokenize, pos_tag text = "I love learning NLP." tokens = word_tokenize(text) pos_tags = pos_tag(tokens) print(pos_tags)
Output
[('I', 'PRP'), ('love', 'VBP'), ('learning', 'VBG'), ('NLP', 'NNP'), ('.', '.')]
Example
This example shows how to tokenize a sentence and get POS tags for each word using NLTK.
python
import nltk nltk.download('punkt') nltk.download('averaged_perceptron_tagger') from nltk import word_tokenize, pos_tag sentence = "Python is great for natural language processing." tokens = word_tokenize(sentence) pos_tags = pos_tag(tokens) print(pos_tags)
Output
[('Python', 'NNP'), ('is', 'VBZ'), ('great', 'JJ'), ('for', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('.', '.')]
Common Pitfalls
Common mistakes when doing POS tagging include:
- Not tokenizing text before tagging, which causes errors.
- Forgetting to download required NLTK data packages like
punktandaveraged_perceptron_tagger. - Assuming POS tags are full words instead of short codes (e.g.,
NNmeans noun).
python
import nltk # Wrong: tagging raw text without tokenizing try: print(nltk.pos_tag("This is wrong")) except Exception as e: print(f"Error: {e}") # Right: tokenize first from nltk import word_tokenize, pos_tag text = "This is correct" tokens = word_tokenize(text) print(pos_tag(tokens))
Output
Error: expected string or bytes-like object
[('This', 'DT'), ('is', 'VBZ'), ('correct', 'JJ')]
Quick Reference
POS tag examples from NLTK's tagset:
| POS Tag | Meaning |
|---|---|
| NN | Noun, singular |
| NNS | Noun, plural |
| VB | Verb, base form |
| VBD | Verb, past tense |
| JJ | Adjective |
| RB | Adverb |
| PRP | Personal pronoun |
| IN | Preposition or subordinating conjunction |
| . | Punctuation |
Key Takeaways
Always tokenize text before POS tagging using word_tokenize.
Use nltk.pos_tag to get part-of-speech tags for each token.
Download required NLTK data packages before running POS tagging.
POS tags are short codes representing word types, not full words.
Common errors come from skipping tokenization or missing downloads.
