0
0
NLPml~20 mins

Stemming (Porter, Snowball) in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Stemming Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of Porter Stemmer on a word list
What is the output list after applying the Porter Stemmer to the words ['running', 'jumps', 'easily', 'fairly']?
NLP
from nltk.stem import PorterStemmer
ps = PorterStemmer()
words = ['running', 'jumps', 'easily', 'fairly']
stemmed = [ps.stem(word) for word in words]
print(stemmed)
A['run', 'jump', 'easily', 'fairly']
B['running', 'jump', 'easily', 'fairly']
C['run', 'jumps', 'easili', 'fairly']
D['run', 'jump', 'easili', 'fairli']
Attempts:
2 left
💡 Hint
Porter Stemmer often removes suffixes like 'ing', 's', and changes 'y' endings.
🧠 Conceptual
intermediate
1:30remaining
Difference between Porter and Snowball Stemmer
Which statement correctly describes a key difference between the Porter Stemmer and the Snowball Stemmer?
ASnowball Stemmer always produces longer stems than Porter Stemmer.
BPorter Stemmer supports multiple languages while Snowball only supports English.
CSnowball Stemmer is a newer, more readable and flexible version of Porter Stemmer supporting multiple languages.
DPorter Stemmer uses machine learning while Snowball uses rule-based stemming.
Attempts:
2 left
💡 Hint
Think about language support and code design improvements.
Metrics
advanced
2:00remaining
Evaluating stemming impact on text classification accuracy
You train a text classifier on raw text and then on stemmed text using Porter Stemmer. The accuracy on test data changes from 82% to 79%. What is the most likely explanation?
AStemming always improves accuracy, so this must be a bug in the code.
BStemming reduced vocabulary size but also removed useful distinctions, slightly lowering accuracy.
CThe test data was stemmed but training data was not, causing mismatch and accuracy drop.
DPorter Stemmer introduced spelling errors that confused the classifier.
Attempts:
2 left
💡 Hint
Consider how stemming affects word forms and model learning.
🔧 Debug
advanced
1:30remaining
Identifying error in Snowball Stemmer usage
What error will this code raise? from nltk.stem import SnowballStemmer stemmer = SnowballStemmer('english') print(stemmer.stem(123))
ATypeError because stem() expects a string, not an integer
BAttributeError because integers have no lower() method
CValueError because 'english' is not a valid language
DNo error, outputs '123'
Attempts:
2 left
💡 Hint
Check the input type expected by stem() method.
Model Choice
expert
2:30remaining
Choosing stemming method for multilingual text preprocessing
You have a dataset with English, Spanish, and French texts. Which stemming approach is best to preprocess this data before training a model?
AUse Snowball Stemmer specifying the language for each text before stemming
BUse Porter Stemmer on all texts regardless of language
CUse a custom rule-based stemmer designed only for English
DUse no stemming and rely on raw text for all languages
Attempts:
2 left
💡 Hint
Consider language support in stemming tools.