0
0
NLPml~5 mins

Regular expressions for text cleaning in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is a regular expression (regex)?
A regular expression is a pattern of characters used to find or match text. It helps to search, replace, or clean text by describing what to look for.
Click to reveal answer
beginner
Why do we use regular expressions for text cleaning in machine learning?
We use regex to remove unwanted parts like extra spaces, special characters, or numbers from text. This makes the text easier for models to understand.
Click to reveal answer
intermediate
What does the regex pattern '\s+' match?
It matches one or more whitespace characters like spaces, tabs, or new lines. Useful to find extra spaces to clean or replace.
Click to reveal answer
intermediate
How can you remove all digits from a text using regex?
Use the pattern '\d' which matches any digit. Replace all matches with an empty string to remove digits.
Click to reveal answer
advanced
Explain the regex pattern '[^a-zA-Z ]' and its use in text cleaning.
This pattern matches any character that is NOT a letter (a-z or A-Z) or a space. It helps remove punctuation or special symbols from text.
Click to reveal answer
Which regex pattern matches one or more spaces?
A\s+
B\d+
C[a-z]+
D\w+
What does the regex '\d' match?
AAny whitespace
BAny letter
CAny digit
DAny special character
How would you remove punctuation from text using regex?
AReplace '[^a-zA-Z ]' with empty string
BReplace '\d' with empty string
CReplace '\s+' with empty string
DReplace '[a-z]' with empty string
Which regex pattern matches any word character (letters, digits, underscore)?
A\d
B\s
C[^a-zA-Z]
D\w
What is the purpose of using regex in text cleaning for machine learning?
ATo find and fix spelling errors
BTo find and remove unwanted text patterns
CTo add random characters
DTo translate text to another language
Describe how regular expressions help in cleaning text data for machine learning.
Think about how patterns can find spaces, digits, or symbols to remove.
You got /4 concepts.
    Explain the difference between '\s', '\d', and '[^a-zA-Z ]' regex patterns in text cleaning.
    Consider what kinds of characters each pattern targets.
    You got /4 concepts.