Challenge - 5 Problems

🎖️

Regex Mastery in Pandas

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this regex filter in Pandas?

Given the DataFrame df below, what will be the output of df[df['Name'].str.contains('^A.*n$', regex=True)]?

Pandas

import pandas as pd

data = {'Name': ['Alan', 'Ann', 'Aarons', 'Ben', 'Ao']}
df = pd.DataFrame(data)

result = df[df['Name'].str.contains('^A.*n$', regex=True)]
print(result)

   Name
0  Alan
1   Ann

   Name
0  Alan
1   Ann
2  Aaron

   Name
1   Ann
4    An

   Name
1   Ann

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

How many rows match this regex pattern?

Using the DataFrame df below, how many rows will match df['Email'].str.match(r'^[\w.-]+@example\.com$')?

Pandas

import pandas as pd

data = {'Email': ['user1@example.com', 'user2@test.com', 'admin@example.com', 'guest@example.org']}
df = pd.DataFrame(data)

matches = df['Email'].str.match(r'^[\w.-]+@example\.com$')
print(matches.sum())

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does this regex filter raise an error?

Consider this code snippet:

df['Phone'].str.contains('^\d{3}-\d{3}-\d{4}$')

Why does it raise a ValueError?

Pandas

import pandas as pd

data = {'Phone': ['123-456-7890', '987-654-3210', None]}
df = pd.DataFrame(data)

result = df['Phone'].str.contains('^\d{3}-\d{3}-\d{4}$')

ABecause the column contains None values and str.contains does not handle NaNs by default.

BBecause the regex pattern needs raw string notation (r'...').

CBecause the regex pattern is invalid syntax.

DBecause the DataFrame is empty.

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Which plot shows the count of rows matching a regex pattern?

Given this DataFrame df with a 'Category' column, which code snippet produces a bar plot showing counts of rows where 'Category' matches regex '^A.*'?

Pandas

import pandas as pd
import matplotlib.pyplot as plt

data = {'Category': ['Apple', 'Banana', 'Apricot', 'Berry', 'Avocado']}
df = pd.DataFrame(data)

df['Category'].value_counts().plot(kind='bar')
plt.show()

df['Category'].str.contains('^A.*').plot(kind='bar')
plt.show()

df[df['Category'].str.match('^A.*')]['Category'].value_counts().plot(kind='bar')
plt.show()

df[df['Category'].str.contains('^A.*')].plot(kind='bar')
plt.show()

Attempts:

2 left

🚀 Application

expert

3:00remaining

Extract area codes from phone numbers using regex in Pandas

Given a DataFrame df with a 'Phone' column containing phone numbers like '123-456-7890', which code extracts the area code (first three digits) into a new column 'AreaCode'?

Pandas

import pandas as pd

data = {'Phone': ['123-456-7890', '987-654-3210', '555-123-4567']}
df = pd.DataFrame(data)

Adf['AreaCode'] = df['Phone'].str.extract(r'\d{3}-(\d{3})-\d{4}')

Bdf['AreaCode'] = df['Phone'].str.extract(r'(\d{3})-\d{3}-\d{4}')

Cdf['AreaCode'] = df['Phone'].str.slice(0, 3)

Ddf['AreaCode'] = df['Phone'].str.findall(r'\d{3}').str[1]

Attempts:

2 left