0
0
Data Analysis Pythondata~20 mins

Pattern matching with str.contains in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Pattern Matching Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of str.contains with case sensitivity
What is the output of this code snippet?
Data Analysis Python
import pandas as pd

data = pd.Series(['Apple', 'banana', 'Cherry', 'date'])
result = data.str.contains('a')
print(result.tolist())
A[True, False, False, True]
B[True, True, False, True]
C[True, True, True, True]
D[False, True, False, True]
Attempts:
2 left
💡 Hint
Remember that str.contains is case sensitive by default.
data_output
intermediate
2:00remaining
Filtering DataFrame rows with str.contains and regex
Given the DataFrame below, which rows are selected by the filter df[df['Name'].str.contains('^J.*n$', regex=True)]?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'Name': ['John', 'Jason', 'Jordan', 'Joan', 'Jan', 'Jim']})
filtered = df[df['Name'].str.contains('^J.*n$', regex=True)]
print(filtered['Name'].tolist())
A['John', 'Jordan', 'Jan']
B['John', 'Jason', 'Jordan', 'Joan', 'Jan']
C['John', 'Jordan', 'Joan', 'Jan']
D['John', 'Jan']
Attempts:
2 left
💡 Hint
The regex '^J.*n$' means strings starting with 'J' and ending with 'n'.
🔧 Debug
advanced
2:00remaining
Identify the error in str.contains usage
What error does this code raise?
Data Analysis Python
import pandas as pd

data = pd.Series(['cat', 'dog', 'bird'])
result = data.str.contains('[a-z')
print(result)
AValueError: Unbalanced bracket expression
BSyntaxError
CTypeError
DNo error, outputs a boolean Series
Attempts:
2 left
💡 Hint
Check the regex pattern for correctness.
visualization
advanced
2:00remaining
Visualizing pattern match counts in a DataFrame
Which code snippet correctly creates a bar chart showing counts of rows where 'City' contains 'York' (case insensitive)?
Data Analysis Python
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'City': ['New York', 'Yorkshire', 'Los Angeles', 'york', 'Boston']})

# Which code below produces the correct bar chart?
A
counts = df['City'].str.contains('York', case=True).value_counts()
counts.plot(kind='bar')
plt.show()
B
counts = df['City'].str.contains('York').value_counts()
counts.plot(kind='bar')
plt.show()
C
counts = df['City'].str.contains('York', case=False).value_counts()
counts.plot(kind='bar')
plt.show()
D
counts = df['City'].str.contains('york', case=True).value_counts()
counts.plot(kind='bar')
plt.show()
Attempts:
2 left
💡 Hint
Look for case insensitive matching.
🚀 Application
expert
3:00remaining
Extracting and counting patterns with str.contains and groupby
You have a DataFrame with a 'Text' column containing sentences. You want to count how many rows mention either 'cat' or 'dog' (case insensitive). Which code snippet correctly produces a DataFrame with counts for each animal?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'Text': ['I love my Cat', 'Dog is friendly', 'Cats and dogs', 'No pets here', 'dog and cat']})

# Which code below produces a DataFrame with counts of rows mentioning 'cat' and 'dog'?
A
counts = df['Text'].str.contains('cat|dog', case=False).value_counts()
print(counts)
B
counts = pd.Series({
    'cat': df['Text'].str.contains('cat', case=False).sum(),
    'dog': df['Text'].str.contains('dog', case=False).sum()
}).to_frame('Count')
print(counts)
C
counts = df['Text'].apply(lambda x: 'cat' if 'cat' in x else ('dog' if 'dog' in x else None)).value_counts()
print(counts)
D
counts = df['Text'].str.extractall('(cat|dog)', flags=re.IGNORECASE).groupby(0).size().to_frame('Count')
print(counts)
Attempts:
2 left
💡 Hint
Use str.contains for each animal separately and sum the booleans.