0
0
NLPml~5 mins

Lowercasing and normalization in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is lowercasing in text preprocessing?
Lowercasing means converting all letters in text to lowercase. It helps treat words like 'Apple' and 'apple' as the same word.
Click to reveal answer
beginner
Why do we normalize text in NLP?
Normalization makes text consistent by fixing variations like accents, punctuation, or spacing. This helps models understand text better.
Click to reveal answer
intermediate
Give an example of text normalization besides lowercasing.
Removing accents (e.g., changing 'café' to 'cafe') or replacing multiple spaces with a single space are examples of normalization.
Click to reveal answer
intermediate
How does lowercasing affect model vocabulary size?
Lowercasing reduces vocabulary size by merging words that differ only in case, making the model simpler and faster.
Click to reveal answer
advanced
What is a potential downside of lowercasing?
Lowercasing can lose information, like proper nouns or acronyms, which might be important in some tasks.
Click to reveal answer
What does lowercasing do to the word 'Hello'?
ARemoves the word
BConverts it to 'HELLO'
CConverts it to 'hello'
DAdds punctuation
Which of these is NOT a normalization step?
AAdding random characters
BLowercasing
CRemoving accents
DReplacing multiple spaces with one
Why normalize text before training an NLP model?
ATo increase text length
BTo make text consistent and easier to understand
CTo add noise to data
DTo remove all vowels
What is a common effect of lowercasing on vocabulary size?
AVocabulary size increases
BVocabulary size doubles
CVocabulary size stays the same
DVocabulary size decreases
Which is a risk of lowercasing text?
ALosing important case information
BMaking text longer
CAdding accents
DRemoving stopwords
Explain why lowercasing and normalization are important in preparing text for machine learning models.
Think about how text variations affect model learning.
You got /4 concepts.
    Describe some common normalization techniques used in NLP besides lowercasing.
    Consider how text can be made consistent.
    You got /4 concepts.