Challenge - 5 Problems
Regex Text Cleaning Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this regex substitution?
Given the text
"Hello!!! Are you #1?", what is the result after applying re.sub(r'[^a-zA-Z0-9 ]', '', text)?NLP
import re text = "Hello!!! Are you #1?" result = re.sub(r'[^a-zA-Z0-9 ]', '', text) print(result)
Attempts:
2 left
💡 Hint
The regex removes all characters except letters, digits, and spaces.
✗ Incorrect
The pattern
[^a-zA-Z0-9 ] matches any character that is NOT a letter, digit, or space. So punctuation like '!!!', '#', and '?' are removed.❓ Model Choice
intermediate1:30remaining
Which regex pattern removes all digits from a string?
You want to remove all digits from a text string using
re.sub. Which pattern should you use?Attempts:
2 left
💡 Hint
Digits are represented by \d in regex.
✗ Incorrect
The pattern
\d+ matches one or more digits. Using re.sub(r'\d+', '', text) removes all digits.❓ Metrics
advanced2:00remaining
How many tokens remain after cleaning?
Given the text
"Data science 101: Clean, analyze, & visualize!", after applying re.sub(r'[^a-zA-Z ]', '', text).lower().split(), how many tokens are in the resulting list?NLP
import re text = "Data science 101: Clean, analyze, & visualize!" cleaned = re.sub(r'[^a-zA-Z ]', '', text).lower().split() print(len(cleaned))
Attempts:
2 left
💡 Hint
Digits and punctuation are removed before splitting by spaces.
✗ Incorrect
After removing non-letters and spaces, '101', ':', ',', '&', and '!' are removed. The remaining words are ['data', 'science', 'clean', 'analyze', 'visualize'], totaling 5 tokens.
🔧 Debug
advanced2:30remaining
Why does this regex fail to remove punctuation?
This code aims to remove punctuation but does not work as expected:
import re text = "Hello, world!" cleaned = re.sub(r'[\w]', '', text) print(cleaned)Why?
NLP
import re text = "Hello, world!" cleaned = re.sub(r'[\w]', '', text) print(cleaned)
Attempts:
2 left
💡 Hint
Check what \w matches in regex.
✗ Incorrect
The pattern '[\w]' matches any letter, digit, or underscore. So it removes letters and digits, leaving punctuation untouched.
🧠 Conceptual
expert3:00remaining
Which regex pattern best cleans URLs from text?
You want to remove URLs from text data using regex. Which pattern is most effective?
Attempts:
2 left
💡 Hint
URLs often start with http or https and continue until a space.
✗ Incorrect
The pattern
https?://\S+ matches 'http://' or 'https://' followed by non-space characters, effectively capturing full URLs.