Recall & Review
beginner
What is lowercasing in text preprocessing?
Lowercasing means converting all letters in text to lowercase. It helps treat words like 'Apple' and 'apple' as the same word.
Click to reveal answer
beginner
Why do we normalize text in NLP?
Normalization makes text consistent by fixing variations like accents, punctuation, or spacing. This helps models understand text better.
Click to reveal answer
intermediate
Give an example of text normalization besides lowercasing.
Removing accents (e.g., changing 'café' to 'cafe') or replacing multiple spaces with a single space are examples of normalization.
Click to reveal answer
intermediate
How does lowercasing affect model vocabulary size?
Lowercasing reduces vocabulary size by merging words that differ only in case, making the model simpler and faster.
Click to reveal answer
advanced
What is a potential downside of lowercasing?
Lowercasing can lose information, like proper nouns or acronyms, which might be important in some tasks.
Click to reveal answer
What does lowercasing do to the word 'Hello'?
✗ Incorrect
Lowercasing changes all letters to lowercase, so 'Hello' becomes 'hello'.
Which of these is NOT a normalization step?
✗ Incorrect
Adding random characters is not normalization; normalization cleans and standardizes text.
Why normalize text before training an NLP model?
✗ Incorrect
Normalization makes text consistent, helping the model learn better.
What is a common effect of lowercasing on vocabulary size?
✗ Incorrect
Lowercasing merges words differing only by case, reducing vocabulary size.
Which is a risk of lowercasing text?
✗ Incorrect
Lowercasing can lose case information like proper nouns or acronyms.
Explain why lowercasing and normalization are important in preparing text for machine learning models.
Think about how text variations affect model learning.
You got /4 concepts.
Describe some common normalization techniques used in NLP besides lowercasing.
Consider how text can be made consistent.
You got /4 concepts.