How to Use NLTK Concordance in NLP: Simple Guide
Use the
concordance() method from NLTK's Text class to find all occurrences of a word and see its surrounding context in a text. First, tokenize your text, create an NLTK Text object, then call concordance('word') to display matches with context.Syntax
The concordance() method is called on an NLTK Text object. It takes a single argument, the word you want to search for, and prints all occurrences with surrounding words.
Text.concordance(word): Finds and displays all contexts ofwordin the text.
python
from nltk.text import Text text = Text(['this', 'is', 'a', 'sample', 'text', 'with', 'sample', 'words']) text.concordance('sample')
Output
Displaying 2 of 2 matches:
sample text with sample words
Example
This example shows how to tokenize a sample sentence, create an NLTK Text object, and use concordance() to find the word 'sample' with its context.
python
import nltk from nltk.text import Text # Sample sentence sentence = 'This is a sample sentence to demonstrate NLTK concordance functionality.' # Tokenize the sentence tokens = nltk.word_tokenize(sentence) # Create Text object text_obj = Text(tokens) # Use concordance to find 'sample' text_obj.concordance('sample')
Output
Displaying 1 of 1 matches:
sample sentence to demonstrate NLTK concordance functionality .
Common Pitfalls
1. Forgetting to tokenize the text before creating the Text object will cause errors or no results.
2. Using concordance() on raw strings instead of an NLTK Text object will not work.
3. The search word is case-sensitive by default, so searching for 'Sample' won't find 'sample'.
To fix case sensitivity, convert tokens to lowercase before creating the Text object.
python
import nltk from nltk.text import Text sentence = 'Sample text with sample words.' tokens = nltk.word_tokenize(sentence) # Wrong: case-sensitive search text_obj = Text(tokens) text_obj.concordance('Sample') # Finds 'Sample' text_obj.concordance('sample') # Finds nothing # Right: lowercase tokens tokens_lower = [t.lower() for t in tokens] text_obj_lower = Text(tokens_lower) text_obj_lower.concordance('sample') # Finds both
Output
Displaying 1 of 1 matches:
Sample text with sample words .
Displaying 2 of 2 matches:
sample text with sample words .
Quick Reference
| Method | Description |
|---|---|
| Text.concordance(word) | Show all occurrences of 'word' with context |
| Text.similar(word) | Show words used in similar contexts |
| Text.common_contexts([words]) | Show common contexts for given words |
| Text.tokens | Access the list of tokens in the text |
Key Takeaways
Use NLTK's Text.concordance(word) to find all contexts of a word in tokenized text.
Always tokenize your text before creating the Text object for concordance to work.
Concordance search is case-sensitive; convert tokens to lowercase to avoid missing matches.
Concordance helps explore how words are used in sentences by showing surrounding words.
NLTK Text class offers other useful methods like similar() and common_contexts() for context analysis.
