Challenge - 5 Problems

🎖️

Regex Text Cleaning Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this regex substitution?

Given the text "Hello!!! Are you #1?", what is the result after applying re.sub(r'[^a-zA-Z0-9 ]', '', text)?

NLP

import re
text = "Hello!!! Are you #1?"
result = re.sub(r'[^a-zA-Z0-9 ]', '', text)
print(result)

AHello Are you 1?

BHello!!! Are you 1

CHello Are you #1

DHello Are you 1

Attempts:

2 left

❓ Model Choice

intermediate

1:30remaining

Which regex pattern removes all digits from a string?

You want to remove all digits from a text string using re.sub. Which pattern should you use?

Ar'\d+'

Br'\D+'

Cr'\w+'

Dr'\s+'

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

How many tokens remain after cleaning?

Given the text "Data science 101: Clean, analyze, & visualize!", after applying re.sub(r'[^a-zA-Z ]', '', text).lower().split(), how many tokens are in the resulting list?

NLP

import re
text = "Data science 101: Clean, analyze, & visualize!"
cleaned = re.sub(r'[^a-zA-Z ]', '', text).lower().split()
print(len(cleaned))

Attempts:

2 left

🔧 Debug

advanced

2:30remaining

Why does this regex fail to remove punctuation?

This code aims to remove punctuation but does not work as expected:

import re
text = "Hello, world!"
cleaned = re.sub(r'[\w]', '', text)
print(cleaned)

Why?

NLP

import re
text = "Hello, world!"
cleaned = re.sub(r'[\w]', '', text)
print(cleaned)

AThe pattern '[\w]' matches letters and digits, so it removes them instead of punctuation.

BThe pattern '[\w]' matches punctuation only, so letters remain.

CThe pattern is missing a quantifier like '+' to match multiple characters.

DThe pattern should be '[^\w]' to remove punctuation.

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Which regex pattern best cleans URLs from text?

You want to remove URLs from text data using regex. Which pattern is most effective?

Ar'http://'

Br'www\.\w+'

Cr'https?://\S+'

Dr'\S+\.com'

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of using regular expressions in text cleaning for NLP?

easy

A. To find and remove unwanted patterns or characters in text

B. To train machine learning models directly

C. To store large datasets efficiently

D. To visualize text data with graphs

Regular expressions for text cleaning in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of regular expressions

Step 2: Connect to text cleaning

Final Answer:

Quick Check:

Solution

Step 1: Recall Python's regex module name

Step 2: Check syntax correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand the regex pattern used

Step 2: Apply re.sub to remove unwanted characters

Final Answer:

Quick Check:

Solution

Step 1: Check regex pattern correctness

Step 2: Verify code syntax and function usage

Final Answer:

Quick Check:

Solution

Step 1: Identify a regex pattern that matches URLs

Step 2: Understand the code's cleaning steps

Final Answer:

Quick Check: