Challenge - 5 Problems
Regex Mastery in Pandas
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this regex filter in Pandas?
Given the DataFrame
df below, what will be the output of df[df['Name'].str.contains('^A.*n$', regex=True)]?Pandas
import pandas as pd data = {'Name': ['Alan', 'Ann', 'Aarons', 'Ben', 'Ao']} df = pd.DataFrame(data) result = df[df['Name'].str.contains('^A.*n$', regex=True)] print(result)
Attempts:
2 left
💡 Hint
The regex '^A.*n$' means the string starts with 'A' and ends with 'n'.
✗ Incorrect
The regex '^A.*n$' matches strings that start with 'A' and end with 'n'. 'Alan' and 'Ann' match this pattern, but 'Aarons' has extra characters after 'n', so it does not match. 'Ao' is too short and does not have characters between 'A' and 'n'.
❓ data_output
intermediate1:30remaining
How many rows match this regex pattern?
Using the DataFrame
df below, how many rows will match df['Email'].str.match(r'^[\w.-]+@example\.com$')?Pandas
import pandas as pd data = {'Email': ['user1@example.com', 'user2@test.com', 'admin@example.com', 'guest@example.org']} df = pd.DataFrame(data) matches = df['Email'].str.match(r'^[\w.-]+@example\.com$') print(matches.sum())
Attempts:
2 left
💡 Hint
Look for emails ending exactly with '@example.com'.
✗ Incorrect
The regex matches emails that start with word characters, dots or hyphens, followed by '@example.com'. Only 'user1@example.com' and 'admin@example.com' match.
🔧 Debug
advanced2:00remaining
Why does this regex filter raise an error?
Consider this code snippet:
df['Phone'].str.contains('^\d{3}-\d{3}-\d{4}$')
Why does it raise a ValueError?Pandas
import pandas as pd data = {'Phone': ['123-456-7890', '987-654-3210', None]} df = pd.DataFrame(data) result = df['Phone'].str.contains('^\d{3}-\d{3}-\d{4}$')
Attempts:
2 left
💡 Hint
Check how pandas handles missing values in string operations.
✗ Incorrect
The None value in the 'Phone' column causes str.contains to raise a ValueError unless you specify na=False or na=True to handle missing values.
❓ visualization
advanced2:30remaining
Which plot shows the count of rows matching a regex pattern?
Given this DataFrame
df with a 'Category' column, which code snippet produces a bar plot showing counts of rows where 'Category' matches regex '^A.*'?Pandas
import pandas as pd import matplotlib.pyplot as plt data = {'Category': ['Apple', 'Banana', 'Apricot', 'Berry', 'Avocado']} df = pd.DataFrame(data)
Attempts:
2 left
💡 Hint
You want counts of matching categories, not just True/False bars.
✗ Incorrect
Option C filters rows matching regex, counts occurrences of each category, then plots a bar chart. Option C tries to plot a boolean Series directly, which is not meaningful. Option C plots counts of all categories, ignoring regex. Option C tries to plot the filtered DataFrame directly, which is not a count plot.
🚀 Application
expert3:00remaining
Extract area codes from phone numbers using regex in Pandas
Given a DataFrame
df with a 'Phone' column containing phone numbers like '123-456-7890', which code extracts the area code (first three digits) into a new column 'AreaCode'?Pandas
import pandas as pd data = {'Phone': ['123-456-7890', '987-654-3210', '555-123-4567']} df = pd.DataFrame(data)
Attempts:
2 left
💡 Hint
Use capturing groups in regex to extract parts of strings.
✗ Incorrect
Option B correctly captures the first three digits (area code) using parentheses in the regex. Option B captures the second group of digits (middle three). Option B slices the string but does not guarantee digits only. Option B finds all groups of three digits and picks the second, which is the middle part, not area code.