Practice

(1/5)

1. What is the main purpose of removing punctuation and special characters in text preprocessing for NLP?

easy

A. To increase the length of the text

B. To clean text for better machine understanding

C. To add more special symbols for emphasis

D. To make the text harder to read

Solution

Step 1: Understand text preprocessing goals
Text preprocessing aims to simplify text so machines can analyze it better.
Step 2: Role of punctuation removal
Removing punctuation and special characters reduces noise and irrelevant symbols in text.
Final Answer:
To clean text for better machine understanding -> Option B
Quick Check:
Text cleaning = Better machine understanding [OK]

Hint: Removing punctuation cleans text for easier analysis [OK]

Common Mistakes:

Thinking punctuation adds meaning for machines
Believing removal increases text length
Assuming special characters improve model accuracy

2. Which Python code snippet correctly removes punctuation from the string text = "Hello, world!" using regular expressions?

easy

A. re.sub(r'[\w]', '', text)

B. re.sub(r'[\d]', '', text)

C. re.sub(r'[\W]', '', text)

D. re.sub(r'[\s]', '', text)

Solution

Step 1: Understand regex classes
\W matches any non-word character, including punctuation.
Step 2: Apply regex to remove punctuation
Using re.sub(r'[\W]', '', text) removes punctuation and special characters.
Final Answer:
re.sub(r'[\W]', '', text) -> Option C
Quick Check:
\W removes punctuation [OK]

Hint: Use \W in regex to remove punctuation [OK]

Common Mistakes:

Using \w which matches word characters, not punctuation
Using \d which matches digits only
Using \s which matches spaces, not punctuation

3. What will be the output of this Python code?

import re
text = "Hello, world! Let's clean: this text."
clean_text = re.sub(r'[^\\w\\s]', '', text)
print(clean_text)

medium

A. Hello world Lets clean this text

B. Hello, world! Let's clean: this text.

C. Hello world! Let's clean this text.

D. Hello world Lets clean this text.

Solution

Step 1: Understand regex pattern
Pattern '[^\w\s]' matches any character that is NOT a word character or whitespace, i.e., punctuation.
Step 2: Apply substitution
All punctuation marks like commas, apostrophes, colons, and periods are removed.
Final Answer:
Hello world Lets clean this text -> Option A
Quick Check:
Removed punctuation, kept words and spaces [OK]

Hint: Regex [^\w\s] removes punctuation, keeps words and spaces [OK]

Common Mistakes:

Expecting apostrophes to remain
Confusing \w with punctuation
Not noticing spaces are preserved

4. Identify the error in this code snippet intended to remove punctuation:

import re
text = "Good morning! How are you?"
clean_text = re.sub(r'[\w]', '', text)
print(clean_text)

medium

A. The print statement syntax is incorrect

B. The code is missing import statement

C. The regex pattern is correct for punctuation removal

D. The regex removes word characters instead of punctuation

Solution

Step 1: Analyze regex pattern
Pattern '[\w]' matches word characters (letters, digits), not punctuation.
Step 2: Effect on text
It removes letters, leaving punctuation and spaces, opposite of intended.
Final Answer:
The regex removes word characters instead of punctuation -> Option D
Quick Check:
Wrong regex removes words, not punctuation [OK]

Hint: Use \W to remove punctuation, not \w [OK]

Common Mistakes:

Confusing \w and \W in regex
Assuming code lacks imports
Thinking print syntax is wrong

5. You have a dataset with text containing emojis and punctuation. You want to remove only punctuation but keep emojis. Which approach is best?

hard

A. Use regex to remove only ASCII punctuation characters

B. Use regex to remove all non-word and non-space characters

C. Remove all characters except letters and digits

D. Replace emojis with empty string and keep punctuation

Solution

Step 1: Understand emoji vs punctuation
Emojis are special Unicode symbols, not ASCII punctuation.
Step 2: Choose selective removal
Removing only ASCII punctuation preserves emojis, unlike broad regex removing all non-word chars.
Final Answer:
Use regex to remove only ASCII punctuation characters -> Option A
Quick Check:
Selective ASCII punctuation removal keeps emojis [OK]

Hint: Remove ASCII punctuation only to keep emojis [OK]

Common Mistakes:

Removing all non-word chars removes emojis too
Removing all except letters/digits loses emojis
Replacing emojis instead of punctuation

Punctuation and special character removal in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand text preprocessing goals

Step 2: Role of punctuation removal

Final Answer:

Quick Check:

Solution

Step 1: Understand regex classes

Step 2: Apply regex to remove punctuation

Final Answer:

Quick Check:

Solution

Step 1: Understand regex pattern

Step 2: Apply substitution

Final Answer:

Quick Check:

Solution

Step 1: Analyze regex pattern

Step 2: Effect on text

Final Answer:

Quick Check:

Solution

Step 1: Understand emoji vs punctuation

Step 2: Choose selective removal

Final Answer:

Quick Check: