Challenge - 5 Problems
Normalization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this normalization code?
Consider the following Python code that normalizes a text by lowercasing and removing punctuation. What is the printed output?
NLP
import string text = "Hello, World! Let's normalize this text." normalized = ''.join(ch for ch in text.lower() if ch not in string.punctuation) print(normalized)
Attempts:
2 left
💡 Hint
Think about what lowercasing and removing punctuation does to the original text.
✗ Incorrect
The code converts all letters to lowercase and removes all punctuation characters, including commas, apostrophes, and periods.
🧠 Conceptual
intermediate1:30remaining
Why is lowercasing important in text normalization?
Which of the following best explains why lowercasing is a common step in text normalization for machine learning?
Attempts:
2 left
💡 Hint
Think about how case differences affect word counts.
✗ Incorrect
Lowercasing helps reduce vocabulary size by treating words that differ only in case as the same, which simplifies the model's learning.
❓ Metrics
advanced2:00remaining
How does normalization affect model accuracy?
You train two text classification models: Model A uses raw text, Model B uses normalized text (lowercased, punctuation removed). Which outcome is most likely?
Attempts:
2 left
💡 Hint
Consider how noise and vocabulary size affect learning.
✗ Incorrect
Normalization reduces noise and vocabulary size, helping the model generalize better and often improving accuracy.
🔧 Debug
advanced1:30remaining
Identify the error in this normalization code
What error does this code raise when run?
import string
text = "Normalize THIS!"
normalized = ''.join(ch for ch in text.lower() if ch != string.punctuation)
print(normalized)
Attempts:
2 left
💡 Hint
Look carefully at how punctuation is checked in the condition.
✗ Incorrect
No runtime error is raised. The condition 'ch != string.punctuation' always evaluates to True because a single character never equals the entire 'string.punctuation' string, so no punctuation is removed. Use 'ch not in string.punctuation' instead.
❓ Model Choice
expert2:30remaining
Choosing the best normalization for noisy text data
You have a dataset of social media posts with many uppercase letters, emojis, and punctuation. Which normalization approach is best before training a sentiment analysis model?
Attempts:
2 left
💡 Hint
Consider what noise elements can confuse the model and what information is useful.
✗ Incorrect
Lowercasing reduces vocabulary size, removing punctuation and emojis reduces noise that may not help sentiment detection, improving model focus.