Bird
Raised Fist0
NlpHow-ToBeginner · 3 min read

How to Use NLTK Concordance in NLP: Simple Guide

Use the concordance() method from NLTK's Text class to find all occurrences of a word and see its surrounding context in a text. First, tokenize your text, create an NLTK Text object, then call concordance('word') to display matches with context.
📐

Syntax

The concordance() method is called on an NLTK Text object. It takes a single argument, the word you want to search for, and prints all occurrences with surrounding words.

  • Text.concordance(word): Finds and displays all contexts of word in the text.
python
from nltk.text import Text

text = Text(['this', 'is', 'a', 'sample', 'text', 'with', 'sample', 'words'])
text.concordance('sample')
Output
Displaying 2 of 2 matches: sample text with sample words
💻

Example

This example shows how to tokenize a sample sentence, create an NLTK Text object, and use concordance() to find the word 'sample' with its context.

python
import nltk
from nltk.text import Text

# Sample sentence
sentence = 'This is a sample sentence to demonstrate NLTK concordance functionality.'

# Tokenize the sentence
tokens = nltk.word_tokenize(sentence)

# Create Text object
text_obj = Text(tokens)

# Use concordance to find 'sample'
text_obj.concordance('sample')
Output
Displaying 1 of 1 matches: sample sentence to demonstrate NLTK concordance functionality .
⚠️

Common Pitfalls

1. Forgetting to tokenize the text before creating the Text object will cause errors or no results.
2. Using concordance() on raw strings instead of an NLTK Text object will not work.
3. The search word is case-sensitive by default, so searching for 'Sample' won't find 'sample'.

To fix case sensitivity, convert tokens to lowercase before creating the Text object.

python
import nltk
from nltk.text import Text

sentence = 'Sample text with sample words.'
tokens = nltk.word_tokenize(sentence)

# Wrong: case-sensitive search
text_obj = Text(tokens)
text_obj.concordance('Sample')  # Finds 'Sample'
text_obj.concordance('sample')  # Finds nothing

# Right: lowercase tokens
tokens_lower = [t.lower() for t in tokens]
text_obj_lower = Text(tokens_lower)
text_obj_lower.concordance('sample')  # Finds both
Output
Displaying 1 of 1 matches: Sample text with sample words . Displaying 2 of 2 matches: sample text with sample words .
📊

Quick Reference

MethodDescription
Text.concordance(word)Show all occurrences of 'word' with context
Text.similar(word)Show words used in similar contexts
Text.common_contexts([words])Show common contexts for given words
Text.tokensAccess the list of tokens in the text

Key Takeaways

Use NLTK's Text.concordance(word) to find all contexts of a word in tokenized text.
Always tokenize your text before creating the Text object for concordance to work.
Concordance search is case-sensitive; convert tokens to lowercase to avoid missing matches.
Concordance helps explore how words are used in sentences by showing surrounding words.
NLTK Text class offers other useful methods like similar() and common_contexts() for context analysis.