Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a regular expression (regex)?
A regular expression is a pattern of characters used to find or match text. It helps to search, replace, or clean text by describing what to look for.
Click to reveal answer
beginner
Why do we use regular expressions for text cleaning in machine learning?
We use regex to remove unwanted parts like extra spaces, special characters, or numbers from text. This makes the text easier for models to understand.
Click to reveal answer
intermediate
What does the regex pattern '\s+' match?
It matches one or more whitespace characters like spaces, tabs, or new lines. Useful to find extra spaces to clean or replace.
Click to reveal answer
intermediate
How can you remove all digits from a text using regex?
Use the pattern '\d' which matches any digit. Replace all matches with an empty string to remove digits.
Click to reveal answer
advanced
Explain the regex pattern '[^a-zA-Z ]' and its use in text cleaning.
This pattern matches any character that is NOT a letter (a-z or A-Z) or a space. It helps remove punctuation or special symbols from text.
Click to reveal answer
Which regex pattern matches one or more spaces?
A\s+
B\d+
C[a-z]+
D\w+
✗ Incorrect
The pattern '\s+' matches one or more whitespace characters like spaces or tabs.
What does the regex '\d' match?
AAny whitespace
BAny letter
CAny digit
DAny special character
✗ Incorrect
The '\d' pattern matches any digit from 0 to 9.
How would you remove punctuation from text using regex?
AReplace '[^a-zA-Z ]' with empty string
BReplace '\d' with empty string
CReplace '\s+' with empty string
DReplace '[a-z]' with empty string
✗ Incorrect
The pattern '[^a-zA-Z ]' matches anything that is not a letter or space, so replacing it removes punctuation.
Which regex pattern matches any word character (letters, digits, underscore)?
A\d
B\s
C[^a-zA-Z]
D\w
✗ Incorrect
'\w' matches any word character including letters, digits, and underscore.
What is the purpose of using regex in text cleaning for machine learning?
ATo find and fix spelling errors
BTo find and remove unwanted text patterns
CTo add random characters
DTo translate text to another language
✗ Incorrect
Regex helps find and remove unwanted patterns like extra spaces, digits, or punctuation to clean text.
Describe how regular expressions help in cleaning text data for machine learning.
Think about how patterns can find spaces, digits, or symbols to remove.
You got /4 concepts.
Explain the difference between '\s', '\d', and '[^a-zA-Z ]' regex patterns in text cleaning.
Consider what kinds of characters each pattern targets.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of using regular expressions in text cleaning for NLP?
easy
A. To find and remove unwanted patterns or characters in text
B. To train machine learning models directly
C. To store large datasets efficiently
D. To visualize text data with graphs
Solution
Step 1: Understand the role of regular expressions
Regular expressions are used to identify patterns in text, such as unwanted characters or specific sequences.
Step 2: Connect to text cleaning
Text cleaning involves removing or replacing unwanted parts of text to prepare it for analysis or modeling.
Final Answer:
To find and remove unwanted patterns or characters in text -> Option A
Quick Check:
Regular expressions clean text by pattern matching [OK]
Hint: Regular expressions = pattern search and replace in text [OK]
Common Mistakes:
Confusing regex with model training
Thinking regex stores data
Assuming regex creates visualizations
2. Which of the following is the correct Python syntax to import the regular expression module?
easy
A. from regex import *
B. import regex
C. import re
D. import regular_expression
Solution
Step 1: Recall Python's regex module name
Python's built-in module for regular expressions is named 're'.
Step 2: Check syntax correctness
The correct import statement is 'import re' to use regex functions.
Final Answer:
import re -> Option C
Quick Check:
Python regex module = re [OK]
Hint: Remember: Python regex module is 're' not 'regex' [OK]
Common Mistakes:
Using 'import regex' which is not standard
Trying to import non-existent modules
Confusing module names with function names
3. What will be the output of this Python code snippet?
import re
text = "Hello, World! 123"
cleaned = re.sub(r'[^a-zA-Z ]', '', text)
print(cleaned)
medium
A. Hello World
B. Hello World 123
C. Hello, World!
D. HelloWorld123
Solution
Step 1: Understand the regex pattern used
The pattern '[^a-zA-Z ]' means any character NOT a letter (a-z or A-Z) or space.
Step 2: Apply re.sub to remove unwanted characters
All characters except letters and spaces are removed, so commas, exclamation marks, and digits are deleted.
Final Answer:
Hello World -> Option A
Quick Check:
Regex removes non-letters/spaces = 'Hello World ' [OK]
Hint: [^...] means NOT those characters, so it removes digits and punctuation [OK]
Common Mistakes:
Thinking digits remain after substitution
Confusing character classes with ranges
Ignoring spaces in the pattern
4. Identify the error in this regex code snippet for removing digits from text:
import re
text = "Price: 100 dollars"
cleaned = re.sub(r'\d', '', text)
print(cleaned)
medium
A. The pattern '\d' should be '\D' to remove digits
B. The backslash in '\d' is not escaped properly
C. The re.sub function is used incorrectly
D. The code will run correctly and remove digits
Solution
Step 1: Check regex pattern correctness
The pattern r'\d' correctly matches digits (0-9).
Step 2: Verify code syntax and function usage
The code uses raw string r'\d' which properly escapes the backslash, so digits are removed as intended.
Final Answer:
The code will run correctly and remove digits -> Option D
Quick Check:
r'\d' matches digits; re.sub removes them correctly [OK]
Hint: In raw strings, r'\d' matches digits; no extra escaping needed [OK]
Common Mistakes:
Thinking '\d' needs double escaping outside raw strings
Confusing '\d' with '\D' (non-digit)
Assuming re.sub syntax is wrong
5. You want to clean a text dataset by removing all URLs and extra spaces. Which regex pattern and code snippet correctly achieves this in Python?