Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is lowercasing in text preprocessing?
Lowercasing means converting all letters in text to lowercase. It helps treat words like 'Apple' and 'apple' as the same word.
Click to reveal answer
beginner
Why do we normalize text in NLP?
Normalization makes text consistent by fixing variations like accents, punctuation, or spacing. This helps models understand text better.
Click to reveal answer
intermediate
Give an example of text normalization besides lowercasing.
Removing accents (e.g., changing 'café' to 'cafe') or replacing multiple spaces with a single space are examples of normalization.
Click to reveal answer
intermediate
How does lowercasing affect model vocabulary size?
Lowercasing reduces vocabulary size by merging words that differ only in case, making the model simpler and faster.
Click to reveal answer
advanced
What is a potential downside of lowercasing?
Lowercasing can lose information, like proper nouns or acronyms, which might be important in some tasks.
Click to reveal answer
What does lowercasing do to the word 'Hello'?
ARemoves the word
BConverts it to 'HELLO'
CConverts it to 'hello'
DAdds punctuation
✗ Incorrect
Lowercasing changes all letters to lowercase, so 'Hello' becomes 'hello'.
Which of these is NOT a normalization step?
AAdding random characters
BLowercasing
CRemoving accents
DReplacing multiple spaces with one
✗ Incorrect
Adding random characters is not normalization; normalization cleans and standardizes text.
Why normalize text before training an NLP model?
ATo increase text length
BTo make text consistent and easier to understand
CTo add noise to data
DTo remove all vowels
✗ Incorrect
Normalization makes text consistent, helping the model learn better.
What is a common effect of lowercasing on vocabulary size?
AVocabulary size increases
BVocabulary size doubles
CVocabulary size stays the same
DVocabulary size decreases
✗ Incorrect
Lowercasing merges words differing only by case, reducing vocabulary size.
Which is a risk of lowercasing text?
ALosing important case information
BMaking text longer
CAdding accents
DRemoving stopwords
✗ Incorrect
Lowercasing can lose case information like proper nouns or acronyms.
Explain why lowercasing and normalization are important in preparing text for machine learning models.
Think about how text variations affect model learning.
You got /4 concepts.
Describe some common normalization techniques used in NLP besides lowercasing.
Consider how text can be made consistent.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of lowercasing text in Natural Language Processing?
easy
A. To translate text into another language
B. To make all letters small so words like 'Apple' and 'apple' are treated the same
C. To remove all punctuation marks from the text
D. To split sentences into words
Solution
Step 1: Understand what lowercasing does
Lowercasing changes all letters in text to small letters.
Step 2: Understand why lowercasing is used
This helps treat words like 'Apple' and 'apple' as the same word, improving consistency.
Final Answer:
To make all letters small so words like 'Apple' and 'apple' are treated the same -> Option B
Quick Check:
Lowercasing = uniform word form [OK]
Hint: Lowercase to treat same words equally [OK]
Common Mistakes:
Confusing lowercasing with removing punctuation
Thinking lowercasing translates text
Believing lowercasing splits sentences
2. Which of the following Python code snippets correctly converts a string text to lowercase?
easy
A. text.lowercase()
B. lower(text)
C. text.toLowerCase()
D. text.lower()
Solution
Step 1: Recall Python string method for lowercasing
Python strings have a method called lower() to convert text to lowercase.
Step 2: Check each option
text.lower() uses text.lower(), which is correct. lower(text) is not a Python function. text.toLowerCase() is JavaScript style. text.lowercase() is not a valid method.
Final Answer:
text.lower() -> Option D
Quick Check:
Python lowercase method = lower() [OK]
Hint: Python lowercase method is .lower() [OK]
Common Mistakes:
Using JavaScript syntax in Python
Calling non-existent methods like lowercase()
Trying to use a function named lower() instead of method
3. What will be the output of this Python code?
text = 'Café'
normalized = text.lower()
print(normalized)
medium
A. 'café'
B. 'cafe'
C. 'CAFÉ'
D. 'Cafe'
Solution
Step 1: Apply lower() method on the string 'Café'
The lower() method converts all uppercase letters to lowercase but does not remove accents.
Step 2: Understand effect on accented characters
The accented 'é' remains unchanged because lower() does not normalize accents.
Final Answer:
'café' -> Option A
Quick Check:
lower() keeps accents, just lowers letters [OK]
Hint: lower() changes case but keeps accents [OK]
Common Mistakes:
Assuming accents are removed by lower()
Expecting uppercase output
Confusing normalization with lowercasing
4. The following code aims to lowercase and normalize text but has an error:
import unicodedata
text = 'Café'
normalized = unicodedata.normalize('NFKD', text).lower()
print(normalized)
What is the error and how to fix it?
medium
A. normalize returns a string with accents separated; fix by removing accents after normalization
B. Calling lower() before normalize; fix by swapping the calls
C. lower() returns a string; normalize expects bytes, fix by encoding first
D. No error; code works correctly
Solution
Step 1: Understand what normalize('NFKD') does
It decomposes accented characters into base character plus accent marks.
Step 2: Check the code behavior
After normalization, accents are separate characters, so lower() works but accents remain. To remove accents, you must filter out combining marks after normalization.
Final Answer:
normalize returns a string with accents separated; fix by removing accents after normalization -> Option A
Quick Check:
Normalization decomposes accents; remove them explicitly [OK]
Hint: Normalize then remove accents explicitly [OK]
Common Mistakes:
Thinking lower() removes accents
Swapping normalize and lower() calls incorrectly
Assuming no extra step needed to remove accents
5. You want to preprocess text data by lowercasing and removing accents for a machine learning model. Which Python code snippet correctly does this?
hard
A. import unicodedata
text = 'Café'
text = unicodedata.normalize('NFKD', text)
print(text)
B. text = 'Café'
text = text.lower()
print(text)
C. import unicodedata
text = 'Café'
text = text.lower()
text = ''.join(c for c in unicodedata.normalize('NFKD', text) if not unicodedata.combining(c))
print(text)
D. text = 'Café'
text = text.upper()
print(text)
Solution
Step 1: Lowercase the text
Use text.lower() to convert all letters to lowercase.
Step 2: Normalize and remove accents
Use unicodedata.normalize('NFKD', text) to decompose accents, then remove combining characters to strip accents.
Step 3: Combine steps correctly
import unicodedata
text = 'Café'
text = text.lower()
text = ''.join(c for c in unicodedata.normalize('NFKD', text) if not unicodedata.combining(c))
print(text) does both steps properly: lowercasing first, then normalization and accent removal.
Final Answer:
import unicodedata
text = 'Café'
text = text.lower()
text = ''.join(c for c in unicodedata.normalize('NFKD', text) if not unicodedata.combining(c))
print(text) -> Option C
Quick Check:
Lowercase + normalize + remove accents = clean text [OK]
Hint: Lowercase first, then normalize and remove accents [OK]