0
0
Pandasdata~20 mins

Regex operations in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Regex Mastery in Pandas
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this regex filter in Pandas?
Given the DataFrame df below, what will be the output of df[df['Name'].str.contains('^A.*n$', regex=True)]?
Pandas
import pandas as pd

data = {'Name': ['Alan', 'Ann', 'Aarons', 'Ben', 'Ao']}
df = pd.DataFrame(data)

result = df[df['Name'].str.contains('^A.*n$', regex=True)]
print(result)
A
   Name
0  Alan
1   Ann
B
   Name
0  Alan
1   Ann
2  Aaron
C
   Name
1   Ann
4    An
D
   Name
1   Ann
Attempts:
2 left
💡 Hint
The regex '^A.*n$' means the string starts with 'A' and ends with 'n'.
data_output
intermediate
1:30remaining
How many rows match this regex pattern?
Using the DataFrame df below, how many rows will match df['Email'].str.match(r'^[\w.-]+@example\.com$')?
Pandas
import pandas as pd

data = {'Email': ['user1@example.com', 'user2@test.com', 'admin@example.com', 'guest@example.org']}
df = pd.DataFrame(data)

matches = df['Email'].str.match(r'^[\w.-]+@example\.com$')
print(matches.sum())
A4
B3
C1
D2
Attempts:
2 left
💡 Hint
Look for emails ending exactly with '@example.com'.
🔧 Debug
advanced
2:00remaining
Why does this regex filter raise an error?
Consider this code snippet:
df['Phone'].str.contains('^\d{3}-\d{3}-\d{4}$')
Why does it raise a ValueError?
Pandas
import pandas as pd

data = {'Phone': ['123-456-7890', '987-654-3210', None]}
df = pd.DataFrame(data)

result = df['Phone'].str.contains('^\d{3}-\d{3}-\d{4}$')
ABecause the column contains None values and str.contains does not handle NaNs by default.
BBecause the regex pattern needs raw string notation (r'...').
CBecause the regex pattern is invalid syntax.
DBecause the DataFrame is empty.
Attempts:
2 left
💡 Hint
Check how pandas handles missing values in string operations.
visualization
advanced
2:30remaining
Which plot shows the count of rows matching a regex pattern?
Given this DataFrame df with a 'Category' column, which code snippet produces a bar plot showing counts of rows where 'Category' matches regex '^A.*'?
Pandas
import pandas as pd
import matplotlib.pyplot as plt

data = {'Category': ['Apple', 'Banana', 'Apricot', 'Berry', 'Avocado']}
df = pd.DataFrame(data)
A
df['Category'].value_counts().plot(kind='bar')
plt.show()
B
df['Category'].str.contains('^A.*').plot(kind='bar')
plt.show()
C
df[df['Category'].str.match('^A.*')]['Category'].value_counts().plot(kind='bar')
plt.show()
D
df[df['Category'].str.contains('^A.*')].plot(kind='bar')
plt.show()
Attempts:
2 left
💡 Hint
You want counts of matching categories, not just True/False bars.
🚀 Application
expert
3:00remaining
Extract area codes from phone numbers using regex in Pandas
Given a DataFrame df with a 'Phone' column containing phone numbers like '123-456-7890', which code extracts the area code (first three digits) into a new column 'AreaCode'?
Pandas
import pandas as pd

data = {'Phone': ['123-456-7890', '987-654-3210', '555-123-4567']}
df = pd.DataFrame(data)
Adf['AreaCode'] = df['Phone'].str.extract(r'\d{3}-(\d{3})-\d{4}')
Bdf['AreaCode'] = df['Phone'].str.extract(r'(\d{3})-\d{3}-\d{4}')
Cdf['AreaCode'] = df['Phone'].str.slice(0, 3)
Ddf['AreaCode'] = df['Phone'].str.findall(r'\d{3}').str[1]
Attempts:
2 left
💡 Hint
Use capturing groups in regex to extract parts of strings.