Bird
Raised Fist0
NLPml~20 mins

Stemming (Porter, Snowball) in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Stemming Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of Porter Stemmer on a word list
What is the output list after applying the Porter Stemmer to the words ['running', 'jumps', 'easily', 'fairly']?
NLP
from nltk.stem import PorterStemmer
ps = PorterStemmer()
words = ['running', 'jumps', 'easily', 'fairly']
stemmed = [ps.stem(word) for word in words]
print(stemmed)
A['run', 'jump', 'easily', 'fairly']
B['running', 'jump', 'easily', 'fairly']
C['run', 'jumps', 'easili', 'fairly']
D['run', 'jump', 'easili', 'fairli']
Attempts:
2 left
💡 Hint
Porter Stemmer often removes suffixes like 'ing', 's', and changes 'y' endings.
🧠 Conceptual
intermediate
1:30remaining
Difference between Porter and Snowball Stemmer
Which statement correctly describes a key difference between the Porter Stemmer and the Snowball Stemmer?
ASnowball Stemmer always produces longer stems than Porter Stemmer.
BPorter Stemmer supports multiple languages while Snowball only supports English.
CSnowball Stemmer is a newer, more readable and flexible version of Porter Stemmer supporting multiple languages.
DPorter Stemmer uses machine learning while Snowball uses rule-based stemming.
Attempts:
2 left
💡 Hint
Think about language support and code design improvements.
Metrics
advanced
2:00remaining
Evaluating stemming impact on text classification accuracy
You train a text classifier on raw text and then on stemmed text using Porter Stemmer. The accuracy on test data changes from 82% to 79%. What is the most likely explanation?
AStemming always improves accuracy, so this must be a bug in the code.
BStemming reduced vocabulary size but also removed useful distinctions, slightly lowering accuracy.
CThe test data was stemmed but training data was not, causing mismatch and accuracy drop.
DPorter Stemmer introduced spelling errors that confused the classifier.
Attempts:
2 left
💡 Hint
Consider how stemming affects word forms and model learning.
🔧 Debug
advanced
1:30remaining
Identifying error in Snowball Stemmer usage
What error will this code raise? from nltk.stem import SnowballStemmer stemmer = SnowballStemmer('english') print(stemmer.stem(123))
ATypeError because stem() expects a string, not an integer
BAttributeError because integers have no lower() method
CValueError because 'english' is not a valid language
DNo error, outputs '123'
Attempts:
2 left
💡 Hint
Check the input type expected by stem() method.
Model Choice
expert
2:30remaining
Choosing stemming method for multilingual text preprocessing
You have a dataset with English, Spanish, and French texts. Which stemming approach is best to preprocess this data before training a model?
AUse Snowball Stemmer specifying the language for each text before stemming
BUse Porter Stemmer on all texts regardless of language
CUse a custom rule-based stemmer designed only for English
DUse no stemming and rely on raw text for all languages
Attempts:
2 left
💡 Hint
Consider language support in stemming tools.

Practice

(1/5)
1. What is the main purpose of stemming in Natural Language Processing?
easy
A. To reduce words to their base or root form
B. To translate text into another language
C. To count the number of words in a sentence
D. To generate synonyms for words

Solution

  1. Step 1: Understand stemming concept

    Stemming simplifies words by cutting off suffixes to get the root form.
  2. Step 2: Compare options with stemming goal

    Only To reduce words to their base or root form describes reducing words to their base form, which is the goal of stemming.
  3. Final Answer:

    To reduce words to their base or root form -> Option A
  4. Quick Check:

    Stemming = base form reduction [OK]
Hint: Stemming cuts word endings to find the root [OK]
Common Mistakes:
  • Confusing stemming with translation
  • Thinking stemming counts words
  • Mixing stemming with synonym generation
2. Which of the following is the correct way to import the Porter Stemmer from NLTK in Python?
easy
A. from nltk.stem import PorterStemmer
B. import nltk.PorterStemmer
C. from nltk import PorterStemmer
D. import PorterStemmer from nltk.stem

Solution

  1. Step 1: Recall correct import syntax in Python

    Python imports use 'from module import class' format for specific classes.
  2. Step 2: Match with NLTK Porter Stemmer import

    The correct import is 'from nltk.stem import PorterStemmer' as it imports the class from the stem module.
  3. Final Answer:

    from nltk.stem import PorterStemmer -> Option A
  4. Quick Check:

    Correct import uses 'from nltk.stem import PorterStemmer' [OK]
Hint: Use 'from module import class' for specific imports [OK]
Common Mistakes:
  • Using dot notation incorrectly in import
  • Trying to import class directly from nltk
  • Wrong order of import keywords
3. What is the output of the following Python code using Porter Stemmer?
from nltk.stem import PorterStemmer
ps = PorterStemmer()
words = ['running', 'runs', 'runner']
stemmed = [ps.stem(word) for word in words]
print(stemmed)
medium
A. ['run', 'run', 'run']
B. ['running', 'runs', 'runner']
C. ['run', 'run', 'runner']
D. ['runn', 'run', 'runn']

Solution

  1. Step 1: Apply Porter Stemmer to each word

    Porter Stemmer reduces 'running' and 'runs' to 'run', but 'runner' remains 'runner' because it is treated differently.
  2. Step 2: List the stemmed results

    The list becomes ['run', 'run', 'runner'] after stemming.
  3. Final Answer:

    ['run', 'run', 'runner'] -> Option C
  4. Quick Check:

    Porter stems 'running' and 'runs' to 'run' [OK]
Hint: Porter stems common verb forms to root, but some nouns stay [OK]
Common Mistakes:
  • Assuming all words stem to the same root
  • Confusing stemmed output with original words
  • Expecting 'runner' to stem to 'run'
4. Identify the error in this Snowball Stemmer usage code snippet:
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer('english')
words = ['happiness', 'happier', 'happy']
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)
medium
A. The stem method should be called as stemmer.stem_word(word)
B. No error; code runs correctly and prints stemmed words
C. SnowballStemmer requires language name in uppercase
D. SnowballStemmer must be imported from nltk.stem.snowball

Solution

  1. Step 1: Check SnowballStemmer import and usage

    Importing from nltk.stem and initializing with 'english' is correct and case-insensitive.
  2. Step 2: Verify method call and output

    The stem method is correctly called as stemmer.stem(word), and the code prints stemmed words without error.
  3. Final Answer:

    No error; code runs correctly and prints stemmed words -> Option B
  4. Quick Check:

    SnowballStemmer usage is correct as shown [OK]
Hint: SnowballStemmer language is lowercase string, stem() method used [OK]
Common Mistakes:
  • Using uppercase language name incorrectly
  • Calling non-existent stem_word method
  • Wrong import path for SnowballStemmer
5. You want to preprocess text data by stemming words but keep the original word if it is shorter than 4 characters. Which Python code snippet using Porter Stemmer correctly implements this?
hard
A. stemmed = [ps.stem(word) for word in words if len(word) >= 4]
B. stemmed = [ps.stem(word) if len(word) < 4 else word for word in words]
C. stemmed = [word for word in words if len(word) < 4 else ps.stem(word)]
D. stemmed = [word if len(word) < 4 else ps.stem(word) for word in words]

Solution

  1. Step 1: Understand the condition for stemming

    Words shorter than 4 characters should remain unchanged; others should be stemmed.
  2. Step 2: Check list comprehension syntax

    stemmed = [word if len(word) < 4 else ps.stem(word) for word in words] uses correct if-else inside list comprehension: 'word if len(word) < 4 else ps.stem(word)'.
  3. Final Answer:

    stemmed = [word if len(word) < 4 else ps.stem(word) for word in words] -> Option D
  4. Quick Check:

    Keep short words, stem others with if-else [OK]
Hint: Use 'word if condition else stem(word)' in list comprehension [OK]
Common Mistakes:
  • Swapping if-else order in comprehension
  • Using if without else causing missing elements
  • Incorrect syntax mixing if-else and for