0
0
Data Analysis Pythondata~10 mins

Pattern matching with str.contains in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Pattern matching with str.contains
Start with DataFrame
Call str.contains(pattern)
Check each string for pattern
Return Boolean Series
Use Boolean Series to filter DataFrame
We start with a DataFrame, use str.contains to check each string for a pattern, get a True/False result, then filter rows based on that.
Execution Sample
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David']})
mask = df['Name'].str.contains('a')
result = df[mask]
This code finds rows where the 'Name' column contains the letter 'a' (case-sensitive) and filters the DataFrame.
Execution Table
StepRow IndexName ValueCheck 'a' in NameBoolean ResultFiltered Rows
10AliceNo (uppercase 'A' but no lowercase 'a')FalseExclude
21BobNoFalseExclude
32CharlieYes ('a' in 'Charlie')TrueInclude
43DavidYes ('a' in 'David')TrueInclude
5----Filter applied: rows 2,3 included; rows 0,1 excluded
💡 All rows checked; filtering done based on Boolean mask.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4Final
maskNoneFalseFalseTrueTrue[False, False, True, True]
resultNoneRow 0 excludedRow 1 excludedRow 2 includedRow 3 includedFiltered DataFrame with rows 2,3
Key Moments - 3 Insights
Why does 'Alice' return False even though it starts with uppercase 'A'?
str.contains is case-sensitive by default, and 'Alice' has no lowercase 'a' (only uppercase 'A'), so it does not match. See execution_table step 1.
What happens if the pattern is not found in a string?
The Boolean result is False, so that row is excluded from the filtered DataFrame. See execution_table step 2 for 'Bob'.
How does the Boolean mask help filter the DataFrame?
The mask is a list of True/False values for each row. True means include the row, False means exclude. See execution_table final step.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the Boolean result for 'Charlie' at step 3?
ATrue
BError
CFalse
DNone
💡 Hint
Check the 'Boolean Result' column at step 3 in execution_table.
At which step does the condition 'a' not found in the Name?
AStep 1
BStep 3
CStep 2
DStep 4
💡 Hint
Look for 'False' in the 'Boolean Result' column in execution_table.
If we add case=False to str.contains, what would happen to 'Alice'?
AIt would be False
BIt would be True
CIt would cause an error
DIt would be None
💡 Hint
Case-insensitive matching includes uppercase and lowercase letters; see variable_tracker mask values.
Concept Snapshot
Use str.contains('pattern') on a string column to get a Boolean mask.
This mask shows True where the pattern is found, False otherwise.
By default, matching is case-sensitive.
Use the mask to filter rows in a DataFrame.
Add case=False for case-insensitive matching.
Full Transcript
We start with a DataFrame containing names. We use str.contains to check if each name has the letter 'a'. This returns a list of True or False values, one for each row. True means the pattern is found, False means it is not. We then use this list to keep only the rows where the pattern is found. For example, 'Alice' has no lowercase 'a' (only uppercase 'A'), so it returns False. 'Bob' does not have 'a', so it returns False and is excluded. 'Charlie' and 'David' have lowercase 'a', so they return True and are included. This way, we filter the DataFrame to only include names with lowercase 'a'.